IMO-AnswerBench

Updated Jul 6, 2026·1,465 views

Problem Count: 400
Institution: DeepMind
Category: Math and Reasoning
Metrics: Accuracy
Language: English
Difficulty: Mixed

Overview

A benchmark that tests answers to difficult International Mathematical Olympiad-level mathematics problems.

Related resources

Latest IMO-AnswerBench model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for IMO-AnswerBench.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend

License:

Origin:

Model release cutoff:

Rank	Model				License
	GLM-5.2 Thinking Enabled	91.00	2026-06-13	753.3B	Free Commercial
	Qwen3.7-Max-Preview Thinking Level · Max	90.00	2026-05-20	Unknown	Closed
	Hy3 Thinking Level · High	90.00	2026-07-06	295B	Free Commercial
4	DeepSeek-V4-Pro Thinking Level · Max	89.80	2026-04-24	1600B	Free Commercial
5	DeepSeek-V4-Flash Thinking Level · Max	88.40	2026-04-24	284B	Free Commercial
6	DeepSeek-V4-Pro Thinking Level · High	88.00	2026-04-24	1600B	Free Commercial
7	Step 3.5 Flash Thinking EnabledTools	86.70	2026-02-02	196B	Free Commercial
8	Kimi K2.6 Thinking Enabled	86.00	2026-04-20	1000B	Free Commercial
9	Step 3.5 Flash Thinking Enabled	85.40	2026-02-02	196B	Free Commercial
10	DeepSeek-V4-Flash Thinking Level · High	85.10	2026-04-24	284B	Free Commercial
11	Qwen3-Max-Thinking Thinking Enabled	83.90	2026-01-26	1000B	Closed
12	GLM 5.1 Thinking Enabled	83.80	2026-03-27	75.4B	Free Commercial
13	Qwen 3.6 Plus Preview Thinking Enabled	83.80	2026-03-31	Unknown	Closed
14	Qwen3.6-Max-Preview Thinking Level · Max	83.80	2026-04-18	Unknown	Closed
15	GLM-5 Thinking Enabled	82.50	2026-02-11	744B	Free Commercial
16	Kimi K2.5 Thinking Enabled	81.80	2026-01-27	1000B	Free Commercial
17	Qwen3.5-397B-A17B Thinking Enabled	80.90	2026-02-16	39.7B	Free Commercial
18	Qwen3.6-27B Thinking Enabled	80.80	2026-04-22	27B	Free Commercial
19	Qwen3.6-35B-A3B Thinking Enabled	78.90	2026-04-16	35B	Free Commercial
20	DeepSeek-V4-Flash Standard Mode	41.90	2026-04-24	284B	Free Commercial
21	DeepSeek-V4-Pro Standard Mode	35.30	2026-04-24	1600B	Free Commercial

Latest IMO-AnswerBench model rankings and full benchmark leaderboard

IMO-AnswerBench Rank