Benchmarks in Leipzig

5 hours ago · Source: arxiv.org · Dev

Between April 1 and May 15, 2026, a group of 49 mathematicians compiled a dataset of research-level mathematics questions with known answers. Most of the work was done during the 3-day workshop...

AI Summary

Between April and May 2026, 49 mathematicians created a dataset of 100 research-level mathematics questions with known answers, mainly during a workshop in Leipzig. The questions were tested against several large language models in three stages: initially 41 questions were unsolved, then 16 after further testing, and finally only 2 remained unsolved after heavy-thinking models were used. The results show that the mathematical reasoning of LLMs is becoming highly advanced.

Read Original → · Discuss with AI → · Share →

← Back to news