KI-Roboter ignorieren Beweise. Können wir ihnen bei der Wissenschaft vertrauen?

AI systems based on large language models struggle to incorporate new evidence into their reasoning, according to recent research. Chatbots like ChatGPT, Gemini, and Grok provided incorrect predictions about a simple pen experiment, refusing to update their answers even after being shown live video evidence. The bots could identify details like pen color but failed to adjust their reasoning based on observed results, highlighting a deeper flaw in how AI processes information. A study published on arXiv.org tested AI agents—systems combining LLMs with tools to perform tasks independently—on scientific reasoning tasks, such as identifying chemicals in solutions. The agents could run simulated or real lab experiments but often ignored evidence: 68% of 619 tasks saw them dismissing data at least once, and 53% made unsupported claims. Only 26% successfully used contradictory evidence to modify their output, demonstrating a failure to mimic human scientific reasoning. N.M. Anoop Krishnan, a materials scientist at the Indian Institute of Technology Delhi, noted that human scientists iteratively revise hypotheses based on experimental results, whereas AI agents do not. Kevin Jablonka, a study coauthor from Friedrich Schiller University Jena, emphasized that trust in scientific results depends on transparent, evidence-based processes—something AI agents currently lack. Walter Quattrociocchi, a computer scientist at Sapienza University of Rome, warned that while developers could hardcode fixes for specific cases, the core issue remains: AI agents typically fail to integrate new data dynamically. This limitation raises concerns about their reliability in fields like science and medicine, where evidence-based reasoning is critical. The study suggests AI benchmarks, which often focus only on final results, may overlook critical flaws in how these systems process information. Without addressing this, AI’s role in evidence-dependent fields could be compromised.

AI bots ignore evidence. Can we trust them with science?

Comments (0)