Study Compares OpenAI Model Accuracy to Physicians on Hospital Admissions

A study published April 30 in Science tested OpenAI models against internal medicine physicians using records from 76 patients admitted through an emergency department. The o1 model matched or closely matched the final diagnosis in 67 percent of cases, compared with 55 percent and 50 percent for the two physicians.

May 22, 2:30 PM(52 days ago)·1m read1 source

Study Compares OpenAI Model Accuracy to Physicians on Hospital Admissions

Audio version

Tap play to generate a narrated version.

A study published April 30 in the journal Science examined how well OpenAI models performed when given the same electronic health record data available to physicians at the time of hospital admission. The experiment used records from 76 patients who entered the Beth Israel Deaconess emergency department and were later admitted.

Two internal medicine attending physicians reviewed the cases, and two additional internal medicine physicians, blinded to the source of each diagnosis, scored the results. OpenAI’s o1 model produced an exact or closely related diagnosis in 67 percent of the cases, compared with 55 percent and 50 percent for the two physicians.

The largest difference appeared at the initial triage stage, when the least information was available.

The paper contained six experiments in total.

The emergency-department component was one of them; the remaining five used established medical benchmarks. The authors stated that the AI received the same raw, unprocessed data available to clinicians at each decision point. The study did not test emergency physicians.

The physicians who participated were internal medicine attendings, whose training and daily responsibilities differ from those of emergency physicians.

One of the paper’s authors, Dr.

Adrian Haimovich, an assistant professor of emergency medicine at Harvard Medical School and attending physician at Beth Israel Deaconess Medical Center, wrote that the experiment compared how well large language models and internal medicine physicians guessed the diagnosis of admitted patients using only information available in the emergency department.

" — Dr. com) The authors called for prospective trials rather than immediate deployment. They noted that newer models have already surpassed o1 and that the data used in the study are now dated by current AI standards. The article states that questions remain about governance, accountability, and integration of AI diagnostic tools into clinical workflows.

Study Compares OpenAI Model Accuracy to Physicians on Hospital Admissions