GPQA Diamond
Updated May 2, 2026·4,274 views
- Problem Count
- 198
- Institution
- CohereAI
- Category
- General Evaluation
- Metrics
- Accuracy
- Language
- English
- Difficulty
- Mixed
Overview
GPQA Diamond is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.
Related resources
Latest GPQA Diamond model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for GPQA Diamond.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
License:
Origin:
Model release cutoff:
2 parallel-mode results hidden
GPQA Diamond Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() Claude Mythos Preview Extended Thinking | 94.60 | 2026-04-07 | Unknown | Closed | |
![]() GPT-5.4 Pro Thinking Level · High | 94.40 | 2026-03-05 | Unknown | Closed | |
![]() Gemini 3.1 Pro Preview Thinking Level · High | 94.30 | 2026-02-20 | Unknown | Closed | |
4 | ![]() Opus 4.7 Extended Thinking | 94.20 | 2026-04-16 | Unknown | Closed |
5 | ![]() GPT-5.5 Thinking Level · High | 93.60 | 2026-04-23 | Unknown | Closed |
6 | ![]() GPT-5.2 Parallel · Deep Thinking Mode | 93.20 | 2025-12-11 | Unknown | Closed |
7 | ![]() GPT-5.2 Pro Thinking Enabled | 93.20 | 2025-12-11 | Unknown | Closed |
8 | ![]() GPT-5.4 Thinking Level · Extra High | 92.80 | 2026-03-05 | Unknown | Closed |
9 | ![]() GPT-5.2 Thinking Level · Extra High | 92.40 | 2025-12-11 | Unknown | Closed |
10 | ![]() Gemini 3.0 Pro (Preview 11-2025) Thinking Enabled | 91.90 | 2025-11-18 | Unknown | Closed |
11 | ![]() Claude Opus 4.6 Extended Thinking | 91.31 | 2026-02-05 | Unknown | Closed |
12 | ![]() Gemini 3.0 Pro (Preview 11-2025) Thinking Level · High | 91.00 | 2025-11-18 | Unknown | Closed |
13 | ![]() Kimi K2.6 Thinking Enabled | 90.50 | 2026-04-20 | 1000B | Free Commercial |
14 | ![]() Gemini 3.0 Flash Thinking Enabled | 90.40 | 2025-12-17 | Unknown | Closed |
15 | ![]() Qwen 3.6 Plus Preview Thinking Enabled | 90.40 | 2026-03-31 | Unknown | Closed |
16 | ![]() DeepSeek-V4-Pro Thinking Level · High | 90.10 | 2026-04-24 | 1600B | Free Commercial |
17 | ![]() Claude Sonnet 4.6 Thinking Enabled | 89.90 | 2026-02-17 | Unknown | Closed |
18 | ![]() Muse Spark Thinking Enabled | 89.50 | 2026-04-08 | Unknown | Closed |
19 | ![]() GPT-5-Pro Thinking EnabledTools | 89.40 | 2025-08-07 | Unknown | Closed |
20 | ![]() DeepSeek-V4-Pro Thinking Level · High | 89.10 | 2026-04-24 | 1600B | Free Commercial |
21 | ![]() GPT-5-Pro Thinking Enabled | 88.40 | 2025-08-07 | Unknown | Closed |
22 | ![]() Qwen3.5-397B-A17B Thinking Enabled | 88.40 | 2026-02-16 | 39.7B | Free Commercial |
23 | ![]() GPT-5.1 Thinking Enabled | 88.10 | 2025-11-12 | Unknown | Closed |
24 | ![]() GPT-5.1 Thinking Level · High | 88.10 | 2025-11-12 | Unknown | Closed |
25 | ![]() GPT-5.1 Thinking Level · High | 88.10 | 2025-11-12 | Unknown | Closed |
26 | ![]() DeepSeek-V4-Flash Thinking Level · High | 88.10 | 2026-04-24 | 284B | Free Commercial |
27 | ![]() GPT-5.4 mini Thinking Level · Extra High | 88.00 | 2026-03-17 | Unknown | Closed |
28 | ![]() Qwen3.6-27B Thinking Enabled | 87.80 | 2026-04-22 | 27B | Free Commercial |
Scroll to load 145 more






