Substrate
ai

Study Finds AI Chatbots Often Fail to Generate Accurate Differential Diagnoses for Symptoms

A study by Mass General Brigham researchers examined the performance of AI chatbots in producing differential diagnoses from patient symptoms. The chatbots failed to generate correct lists more than 80 percent of the time with basic information but improved to over 90 percent accuracy with full details. The findings highlight the need for human oversight in medical decision-making.

The Boston Globe
1 source·Apr 13, 3:59 PM(7 hrs ago)·3m read
|
Study Finds AI Chatbots Often Fail to Generate Accurate Differential Diagnoses for SymptomsThe Boston Globe
Audio version
Tap play to generate a narrated version.

A new study published in JAMA Network Open on Monday assessed the reliability of AI chatbots for medical advice. Researchers from Mass General Brigham tested 21 general-purpose large language models, including versions of ChatGPT, DeepSeek, Claude, Gemini, and Grok.

The evaluation used information from 29 published medical cases involving common conditions such as heart failure and ectopic pregnancies.

The chatbots received data incrementally, starting with basic details like age, gender, and symptoms. They failed to produce the correct differential diagnosis— a list of possible causes— more than 80 percent of the time at this stage. Differential diagnosis is a standard process in medicine to identify potential conditions based on initial information.

Performance improved significantly when additional data was provided. After receiving results from physical examinations and laboratory tests, the chatbots identified the correct diagnosis more than 90 percent of the time. This suggests that AI tools can assist effectively once comprehensive information is available.

Limitations in Early-Stage Analysis The study revealed challenges in the initial phases of diagnosis.

Chatbots struggled with open-ended scenarios where limited information was available, which is common when patients first describe symptoms. This phase is critical, as it guides further testing and treatment decisions. Dr.

Marc Succi, executive director of the MESH Incubator at Mass General Brigham and associate professor of radiology at Harvard Medical School, stated in an interview that users should not fully trust chatbot outputs without verification. He emphasized the importance of human involvement, including patient interviews, medical history reviews, diagnostic tests, and physical exams to narrow down possibilities.

Succi noted that acting on incomplete AI advice could lead to unnecessary procedures, such as biopsies for non-cancerous conditions, or delays in urgent care, like treatment for headaches indicating a stroke.

You can’t just trust what the chatbot says. " — Dr. Marc Succi (The Boston Globe) Arya Rao, lead author of the study, a MESH researcher, and MD-PhD student at Harvard Medical School, said the models excel at final diagnoses with complete data but falter at the outset.

Rao's comment underscores the gap between AI capabilities and real-world diagnostic workflows, where information is often gathered progressively.

Context of Primary Care Shortages The research occurs amid a shortage of primary care physicians, affecting access to appointments.

Many patients face long waits when seeking care for symptoms, leading some to use AI chatbots for preliminary assessments. Mass General Brigham, the state's largest health care system, reported that thousands of its patients lack assigned primary care providers. In response to these challenges, Mass General Brigham launched an AI app called Care Connect in September of the previous year.

The app operates 24/7, handles patient inquiries, reviews medical records, and schedules telehealth appointments with physicians within as little as half an hour. It aims to streamline intake processes amid the provider shortage. Dr.

Rajesh Patel, vice president of digital patient experience at Mass General Brigham, stated that the Care Connect chatbot differs from general-purpose tools. It focuses on medical intake to expedite clinician appointments and does not provide diagnoses.

Patel added that patients always consult a real clinician for diagnosis, treatment, and follow-up, aligning with the study's emphasis on physician involvement.

Implications for AI in Healthcare The findings suggest that while AI chatbots show promise in later diagnostic stages, they require supervision to avoid errors in early assessments.

Healthcare providers can use these tools to support, but not replace, human judgment in complex cases. Future developments may involve integrating AI more closely with clinical workflows to enhance accuracy. This study provides evidence-based insights into AI limitations, informing how health systems deploy such technologies.

As adoption grows, balancing innovation with safety remains key, particularly for vulnerable patients relying on timely care.

Story Timeline

2 events
  1. Monday, 2026 (publication date)

    Mass General Brigham study on AI chatbot diagnostic accuracy published in JAMA Network Open.

    1 sourceThe Boston Globe
  2. September 2025

    Mass General Brigham launched Care Connect AI app to address primary care shortages.

    1 sourceThe Boston Globe

Potential Impact

  1. 01

    Health systems may increase human oversight for AI diagnostic tools to prevent errors.

  2. 02

    AI apps like Care Connect may expand to more health systems for intake efficiency.

  3. 03

    Primary care shortages may prompt further investment in hybrid AI-human models.

  4. 04

    Patients could delay seeking professional care based on incomplete AI assessments.

  5. 05

    Research on AI medical accuracy could influence regulatory guidelines for chatbots.

Transparency Panel

Sources cross-referenced1
Framing risk28/100 (low)
Confidence score65%
Synthesized bySubstrate AI (grok-4-fast-non-reasoning)
Word count616 words
PublishedApr 13, 2026, 3:59 PM
Bias signals removed4 across 2 outlets
Signal Breakdown
Loaded 2Editorializing 1Speculative 1

Related Stories

Anthropic Co-Founder Warns of Upcoming AI Capabilities for Exploiting Web VulnerabilitiesSemafor
ai3 hrs ago

Anthropic Co-Founder Warns of Upcoming AI Capabilities for Exploiting Web Vulnerabilities

Anthropic's co-founder stated that powerful AI models capable of exploiting website vulnerabilities will emerge soon. The company's new model, Claude Mythos, identified unknown security flaws in major web browsers and operating systems. Financial authorities have responded by dis…

Semafor
1 source⚠ Single source
Gallup Poll Shows Increasing AI Use Among US Workers with Persistent SkepticismLos Angeles Times
ai4 hrs ago

Gallup Poll Shows Increasing AI Use Among US Workers with Persistent Skepticism

A Gallup poll conducted in February indicates that more American workers are using artificial intelligence in their jobs, with about 3 in 10 using it frequently. However, skepticism remains common, with many non-users citing preferences for traditional methods, ethical concerns,…

Los Angeles Times
1 source⚠ Single source
AI Assistant Poke Charges Billionaire $136,000 Monthly FeeFederal Bureau of Investigation / Wikimedia (Public domain)
ai6 hrs ago

AI Assistant Poke Charges Billionaire $136,000 Monthly Fee

Poke, an AI assistant without a price ceiling, charged one billionaire $136,000 a month. Marvin von Hagen stated this pricing detail. The information highlights Poke's premium service model.

AL
1 source⚠ Single source