DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Home
  2. /
  3. Benchmarks
  4. /
  5. HLE

HLE

Updated May 29, 2026·5,322 views
Current SOTA
Anthropic
Claude Mythos Preview
Anthropic
64.70Score
Problem Count
3000
Institution
Center for AI Safety
Category
General Evaluation
Metrics
Accuracy
Language
English
Difficulty
Mixed

Overview

HLE is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.

Related resources

  • View Paper
  • Get Dataset
  • Official Website
  • DataLearner Blog

Latest HLE model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for HLE.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend
License:
Origin:
Model release cutoff:

1 parallel-mode results hidden

HLE Rank

RankModelLicense
Anthropic
Claude Mythos Preview
Extended ThinkingTools
64.70
2026-04-07UnknownClosed
OpenAI
GPT-5.4 Pro
Thinking EnabledTools
58.70
2026-03-05UnknownClosed
Facebook AI研究实验室
Muse Spark
Parallel · Deep Thinking Mode
58.00
2026-04-08UnknownClosed
4
Anthropic
Claude Opus 4.8
Extended ThinkingTools
57.90
2026-05-28UnknownClosed
5
OpenAI
GPT-5.5 Pro
Thinking Level · Extra HighTools
57.20
2026-04-23UnknownClosed
6
Anthropic
Claude Mythos Preview
Extended Thinking
56.80
2026-04-07UnknownClosed
7
Anthropic
Opus 4.7
Extended ThinkingTools
54.70
2026-04-16UnknownClosed
8
Moonshot AI
Kimi K2.6
Thinking EnabledToolsInternet
54.00
2026-04-201000BFree Commercial
9
阿里巴巴
Qwen3.7-Max-Preview
Thinking EnabledTools
53.50
2026-05-201000BClosed
10
Anthropic
Claude Opus 4.6
Extended ThinkingToolsInternet
53.00
2026-02-05UnknownClosed
11
智谱AI
GLM 5.1
Thinking EnabledTools
52.30
2026-03-2775.4BFree Commercial
12
OpenAI
GPT-5.5
Thinking EnabledTools
52.20
2026-04-23UnknownClosed
13
OpenAI
GPT-5.4
Thinking Level · Extra HighTools
52.10
2026-03-05UnknownClosed
14
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking EnabledTools
51.40
2026-02-20UnknownClosed
15
阿里巴巴
Qwen 3.6 Plus Preview
Thinking EnabledTools
50.60
2026-03-31UnknownClosed
16
智谱AI
GLM-5
Thinking EnabledTools
50.40
2026-02-11744BFree Commercial
17
Facebook AI研究实验室
Muse Spark
Thinking EnabledTools
50.40
2026-04-08UnknownClosed
18
Moonshot AI
Kimi K2.5
Thinking EnabledTools
50.20
2026-01-271000BFree Commercial
19
阿里巴巴
Qwen3.6-Max-Preview
Thinking EnabledTools
50.20
2026-04-201000BClosed
20
OpenAI
GPT-5.2 Pro
Thinking EnabledTools
50.00
2025-12-11UnknownClosed
21
阿里巴巴
Qwen3-Max-Thinking
Thinking EnabledTools
49.80
2026-01-261000BClosed
22
Anthropic
Claude Opus 4.8
Extended Thinking
49.80
2026-05-28UnknownClosed
23
Anthropic
Claude Sonnet 4.6
Thinking EnabledTools
49.00
2026-02-17UnknownClosed
24
阿里巴巴
Qwen3.5-27B
Thinking EnabledTools
48.50
2026-02-2527BFree Commercial
25
Google Deep Mind
Gemini 3 Deep Think - 2620
Thinking Enabled
48.40
2026-02-13UnknownClosed
26
阿里巴巴
Qwen3.5-397B-A17B
Thinking EnabledToolsInternet
48.30
2026-02-1639.7BFree Commercial
27
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · Extra HighTools
48.20
2026-04-241600BFree Commercial
28
Anthropic
Opus 4.7
Extended Thinking
46.90
2026-04-16UnknownClosed
29
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Level · HighTools
45.80
2025-11-18UnknownClosed
Scroll to load 126 more