HLE
Updated May 29, 2026·5,322 views
- Problem Count
- 3000
- Institution
- Center for AI Safety
- Category
- General Evaluation
- Metrics
- Accuracy
- Language
- English
- Difficulty
- Mixed
Overview
HLE is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.
Related resources
Latest HLE model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for HLE.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
License:
Origin:
Model release cutoff:
1 parallel-mode results hidden
HLE Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() Claude Mythos Preview Extended ThinkingTools | 64.70 | 2026-04-07 | Unknown | Closed | |
![]() GPT-5.4 Pro Thinking EnabledTools | 58.70 | 2026-03-05 | Unknown | Closed | |
![]() Muse Spark Parallel · Deep Thinking Mode | 58.00 | 2026-04-08 | Unknown | Closed | |
4 | ![]() Claude Opus 4.8 Extended ThinkingTools | 57.90 | 2026-05-28 | Unknown | Closed |
5 | ![]() GPT-5.5 Pro Thinking Level · Extra HighTools | 57.20 | 2026-04-23 | Unknown | Closed |
6 | ![]() Claude Mythos Preview Extended Thinking | 56.80 | 2026-04-07 | Unknown | Closed |
7 | ![]() Opus 4.7 Extended ThinkingTools | 54.70 | 2026-04-16 | Unknown | Closed |
8 | ![]() Kimi K2.6 Thinking EnabledToolsInternet | 54.00 | 2026-04-20 | 1000B | Free Commercial |
9 | ![]() Qwen3.7-Max-Preview Thinking EnabledTools | 53.50 | 2026-05-20 | 1000B | Closed |
10 | ![]() Claude Opus 4.6 Extended ThinkingToolsInternet | 53.00 | 2026-02-05 | Unknown | Closed |
11 | ![]() GLM 5.1 Thinking EnabledTools | 52.30 | 2026-03-27 | 75.4B | Free Commercial |
12 | ![]() GPT-5.5 Thinking EnabledTools | 52.20 | 2026-04-23 | Unknown | Closed |
13 | ![]() GPT-5.4 Thinking Level · Extra HighTools | 52.10 | 2026-03-05 | Unknown | Closed |
14 | ![]() Gemini 3.1 Pro Preview Thinking EnabledTools | 51.40 | 2026-02-20 | Unknown | Closed |
15 | ![]() Qwen 3.6 Plus Preview Thinking EnabledTools | 50.60 | 2026-03-31 | Unknown | Closed |
16 | ![]() GLM-5 Thinking EnabledTools | 50.40 | 2026-02-11 | 744B | Free Commercial |
17 | ![]() Muse Spark Thinking EnabledTools | 50.40 | 2026-04-08 | Unknown | Closed |
18 | ![]() Kimi K2.5 Thinking EnabledTools | 50.20 | 2026-01-27 | 1000B | Free Commercial |
19 | ![]() Qwen3.6-Max-Preview Thinking EnabledTools | 50.20 | 2026-04-20 | 1000B | Closed |
20 | ![]() GPT-5.2 Pro Thinking EnabledTools | 50.00 | 2025-12-11 | Unknown | Closed |
21 | ![]() Qwen3-Max-Thinking Thinking EnabledTools | 49.80 | 2026-01-26 | 1000B | Closed |
22 | ![]() Claude Opus 4.8 Extended Thinking | 49.80 | 2026-05-28 | Unknown | Closed |
23 | ![]() Claude Sonnet 4.6 Thinking EnabledTools | 49.00 | 2026-02-17 | Unknown | Closed |
24 | ![]() Qwen3.5-27B Thinking EnabledTools | 48.50 | 2026-02-25 | 27B | Free Commercial |
25 | ![]() Gemini 3 Deep Think - 2620 Thinking Enabled | 48.40 | 2026-02-13 | Unknown | Closed |
26 | ![]() Qwen3.5-397B-A17B Thinking EnabledToolsInternet | 48.30 | 2026-02-16 | 39.7B | Free Commercial |
27 | ![]() DeepSeek-V4-Pro Thinking Level · Extra HighTools | 48.20 | 2026-04-24 | 1600B | Free Commercial |
28 | ![]() Opus 4.7 Extended Thinking | 46.90 | 2026-04-16 | Unknown | Closed |
29 | ![]() Gemini 3.0 Pro (Preview 11-2025) Thinking Level · HighTools | 45.80 | 2025-11-18 | Unknown | Closed |
Scroll to load 126 more







