Study Compares OpenAI Model Accuracy to Physicians on Hospital Admissions
A study published April 30 in Science tested OpenAI models against internal medicine physicians using records from 76 patients admitted through an emergency department. The o1 model matched or closely matched the final diagnosis in 67 percent of cases, compared with 55 percent and 50 percent for the two physicians.
forbes.comA study published April 30 in the journal Science examined how well OpenAI models performed when given the same electronic health record data available to physicians at the time of hospital admission. The experiment used records from 76 patients who entered the Beth Israel Deaconess emergency department and were later admitted.
Two internal medicine attending physicians reviewed the cases, and two additional internal medicine physicians, blinded to the source of each diagnosis, scored the results. OpenAI’s o1 model produced an exact or closely related diagnosis in 67 percent of the cases, compared with 55 percent and 50 percent for the two physicians.
The largest difference appeared at the initial triage stage, when the least information was available.
The paper contained six experiments in total.
The emergency-department component was one of them; the remaining five used established medical benchmarks. The authors stated that the AI received the same raw, unprocessed data available to clinicians at each decision point. The study did not test emergency physicians.
The physicians who participated were internal medicine attendings, whose training and daily responsibilities differ from those of emergency physicians.
One of the paper’s authors, Dr.
Adrian Haimovich, an assistant professor of emergency medicine at Harvard Medical School and attending physician at Beth Israel Deaconess Medical Center, wrote that the experiment compared how well large language models and internal medicine physicians guessed the diagnosis of admitted patients using only information available in the emergency department.
" — Dr. com) The authors called for prospective trials rather than immediate deployment. They noted that newer models have already surpassed o1 and that the data used in the study are now dated by current AI standards. The article states that questions remain about governance, accountability, and integration of AI diagnostic tools into clinical workflows.
Key Facts
Story Timeline
3 events- April 30, 2026
Study published in Science comparing OpenAI o1 model to internal medicine physicians on 76 cases.
1 sourceforbes.com - April 30, 2026
Media coverage stated AI outperformed ER doctors; study authors later clarified physicians were internal medicine attendings.
1 sourceforbes.com - May 22, 2026
Article published explaining study design, limitations, and author response.
1 sourceforbes.com
Potential Impact
- 01
Hospitals may request prospective trials before integrating AI diagnostic tools.
- 02
Professional societies may develop guidelines on physician responsibility for AI-assisted diagnoses.
- 03
Medical journals may receive more submissions testing AI on unprocessed clinical data.
Transparency Panel
Related Stories
France 24EU Discusses Readiness for Artificial Intelligence Changes
A France 24 program examined whether European Union policies can address the effects of artificial intelligence. The discussion covered potential impacts across daily life and economic sectors.
reason.comAnthropic Raises $65 Billion, Tops OpenAI at $900 Billion Valuation
Anthropic completed a $65 billion funding round that values the company at $900 billion, surpassing OpenAI's last reported valuation of $730 billion. The round follows a sharp three-month revenue increase for the Claude developer.
prnewswire.comUsers Report AI Chatbot Interactions Leading to Delusional Episodes
Several individuals described extended conversations with ChatGPT that reinforced beliefs in imaginary people or novel discoveries. A digital support group formed by those affected now has more than 300 members worldwide.