Study Finds AI Chatbots Often Fail to Generate Accurate Differential Diagnoses for Symptoms

A study by Mass General Brigham researchers examined the performance of AI chatbots in producing differential diagnoses from patient symptoms. The chatbots failed to generate correct lists more than 80 percent of the time with basic information but improved to over 90 percent accuracy with full details. The findings highlight the need for human oversight in medical decision-making.

1 source·Apr 13, 3:59 PM(7 hrs ago)·3m read|

Study Finds AI Chatbots Often Fail to Generate Accurate Differential Diagnoses for Symptoms

Audio version

Tap play to generate a narrated version.

A new study published in JAMA Network Open on Monday assessed the reliability of AI chatbots for medical advice. Researchers from Mass General Brigham tested 21 general-purpose large language models, including versions of ChatGPT, DeepSeek, Claude, Gemini, and Grok.

The evaluation used information from 29 published medical cases involving common conditions such as heart failure and ectopic pregnancies.

The chatbots received data incrementally, starting with basic details like age, gender, and symptoms. They failed to produce the correct differential diagnosis— a list of possible causes— more than 80 percent of the time at this stage. Differential diagnosis is a standard process in medicine to identify potential conditions based on initial information.

Performance improved significantly when additional data was provided. After receiving results from physical examinations and laboratory tests, the chatbots identified the correct diagnosis more than 90 percent of the time. This suggests that AI tools can assist effectively once comprehensive information is available.

Limitations in Early-Stage Analysis The study revealed challenges in the initial phases of diagnosis.

Chatbots struggled with open-ended scenarios where limited information was available, which is common when patients first describe symptoms. This phase is critical, as it guides further testing and treatment decisions. Dr.

Marc Succi, executive director of the MESH Incubator at Mass General Brigham and associate professor of radiology at Harvard Medical School, stated in an interview that users should not fully trust chatbot outputs without verification. He emphasized the importance of human involvement, including patient interviews, medical history reviews, diagnostic tests, and physical exams to narrow down possibilities.

Succi noted that acting on incomplete AI advice could lead to unnecessary procedures, such as biopsies for non-cancerous conditions, or delays in urgent care, like treatment for headaches indicating a stroke.

“You can’t just trust what the chatbot says. " — Dr. Marc Succi (The Boston Globe) Arya Rao, lead author of the study, a MESH researcher, and MD-PhD student at Harvard Medical School, said the models excel at final diagnoses with complete data but falter at the outset.”

Rao's comment underscores the gap between AI capabilities and real-world diagnostic workflows, where information is often gathered progressively.

Context of Primary Care Shortages The research occurs amid a shortage of primary care physicians, affecting access to appointments.

Many patients face long waits when seeking care for symptoms, leading some to use AI chatbots for preliminary assessments. Mass General Brigham, the state's largest health care system, reported that thousands of its patients lack assigned primary care providers. In response to these challenges, Mass General Brigham launched an AI app called Care Connect in September of the previous year.

The app operates 24/7, handles patient inquiries, reviews medical records, and schedules telehealth appointments with physicians within as little as half an hour. It aims to streamline intake processes amid the provider shortage. Dr.

Rajesh Patel, vice president of digital patient experience at Mass General Brigham, stated that the Care Connect chatbot differs from general-purpose tools. It focuses on medical intake to expedite clinician appointments and does not provide diagnoses.

Patel added that patients always consult a real clinician for diagnosis, treatment, and follow-up, aligning with the study's emphasis on physician involvement.

Implications for AI in Healthcare The findings suggest that while AI chatbots show promise in later diagnostic stages, they require supervision to avoid errors in early assessments.

Healthcare providers can use these tools to support, but not replace, human judgment in complex cases. Future developments may involve integrating AI more closely with clinical workflows to enhance accuracy. This study provides evidence-based insights into AI limitations, informing how health systems deploy such technologies.

As adoption grows, balancing innovation with safety remains key, particularly for vulnerable patients relying on timely care.

AI in healthcare medical diagnosis chatbots primary care shortage Mass General Brigham

Story Timeline

2 events

Monday, 2026 (publication date)
Mass General Brigham study on AI chatbot diagnostic accuracy published in JAMA Network Open.
1 sourceThe Boston Globe
September 2025
Mass General Brigham launched Care Connect AI app to address primary care shortages.
1 sourceThe Boston Globe

Potential Impact

01
Health systems may increase human oversight for AI diagnostic tools to prevent errors.
02
AI apps like Care Connect may expand to more health systems for intake efficiency.
03
Primary care shortages may prompt further investment in hybrid AI-human models.
04
Patients could delay seeking professional care based on incomplete AI assessments.
05
Research on AI medical accuracy could influence regulatory guidelines for chatbots.

Transparency Panel

Sources cross-referenced1

Framing risk28/100 (low)

Confidence score65%

Synthesized bySubstrate AI (grok-4-fast-non-reasoning)

Word count616 words

PublishedApr 13, 2026, 3:59 PM

Bias signals removed4 across 2 outlets

Signal Breakdown

Loaded 2Editorializing 1Speculative 1

Original Sources

The Boston GlobeHow reliable is medical advice from ChatGPT and other chatbots? New Mass General Brigham study gives answers.

Anthropic Co-Founder Warns of Upcoming AI Capabilities for Exploiting Web Vulnerabilities

Anthropic's co-founder stated that powerful AI models capable of exploiting website vulnerabilities will emerge soon. The company's new model, Claude Mythos, identified unknown security flaws in major web browsers and operating systems. Financial authorities have responded by dis…

1 source⚠ Single source

Gallup Poll Shows Increasing AI Use Among US Workers with Persistent Skepticism

Los Angeles Times

ai4 hrs ago

Gallup Poll Shows Increasing AI Use Among US Workers with Persistent Skepticism

A Gallup poll conducted in February indicates that more American workers are using artificial intelligence in their jobs, with about 3 in 10 using it frequently. However, skepticism remains common, with many non-users citing preferences for traditional methods, ethical concerns,…

1 source⚠ Single source

AI Assistant Poke Charges Billionaire $136,000 Monthly Fee

Federal Bureau of Investigation / Wikimedia (Public domain)

ai6 hrs ago

AI Assistant Poke Charges Billionaire $136,000 Monthly Fee

Poke, an AI assistant without a price ceiling, charged one billionaire $136,000 a month. Marvin von Hagen stated this pricing detail. The information highlights Poke's premium service model.

1 source⚠ Single source

Limitations in Early-Stage Analysis The study revealed challenges in the initial phases of diagnosis.

Context of Primary Care Shortages The research occurs amid a shortage of primary care physicians, affecting access to appointments.

Implications for AI in Healthcare The findings suggest that while AI chatbots show promise in later diagnostic stages, they require supervision to avoid errors in early assessments.

Story Timeline

Potential Impact

Transparency Panel

Related Stories

Anthropic Co-Founder Warns of Upcoming AI Capabilities for Exploiting Web Vulnerabilities

Gallup Poll Shows Increasing AI Use Among US Workers with Persistent Skepticism

AI Assistant Poke Charges Billionaire $136,000 Monthly Fee