AI Models Score 6 out of 10 on Novel Research-Level Math Problems

Four AI systems attempted ten unpublished mathematics problems created by researchers. A panel of mathematicians graded the answers and found the top model reached 6 out of 10.

1 source·Jun 15, 5:16 AM(1 day ago)·1m read

AI Models Score 6 out of 10 on Novel Research-Level Math Problems

Audio version

Tap play to generate a narrated version.

Developing·Limited corroboration so far. This page will refresh as more sources emerge.

Four artificial-intelligence systems were given ten research-level mathematics problems that had not appeared in any published literature or online sources. The problems were created by ten researchers who had solved them in their own unpublished work. A jury of anonymous specialists in the relevant fields evaluated the models' answers. The highest-scoring system received 6 out of 10.

The First Proof project required that questions be at research level, absent from training data, and formally graded by mathematicians. These three conditions were met for the first time in this evaluation. The results were posted on the First Proof website on 10 June.

An earlier trial round in February allowed public participation but did not include official verification or controls against human assistance.

The test follows a recent case in which an OpenAI chatbot solved an 80-year-old mathematics problem posed by the late mathematician Paul Erdős. The First Proof team stated that future versions could assess whether AI systems can solve problems independently, verify proofs, or serve as research assistants.

ai mathematics research-evaluation