DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Back to benchmark list
  2. /
  3. MATH-500

MATH-500

在评估大型语言模型(LLM)的数学推理能力时,MATH和MATH-500是两个备受关注的基准测试。尽管它们都旨在衡量模型的数学解题能力,但在发布者、发布目的、评测目标和对比结果等方面存在显著差异。

更新于 2026-03-19
1,488 次浏览
问题数量
500
发布机构
OpenAI
评测类别
数学推理
评测指标
Accuracy
支持语言
英文
难度等级
高难度

简介

OpenAI从MATH评测数据集中精选的500个更具代表性的数学评测基准

相关资源

访问官网
浏览项目官方网站
DataLearner 介绍
中文详细解读

MATH-500 Model Score Leaderboard

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

模式说明:
normal
thinking
low
medium
high
deeper thinking
parallel_thinking
图表加载中...

Latest MATH-500 model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for MATH-500.

MATH-500详细排名数据表格

排名模型
1
Gemini-2.5-Pro-Preview-05-06
Normal
98.82025-05-06未知
2
Gemini 2.5-Pro
Normal
98.82025-06-05未知
3
Claude Opus 4
Normal
98.22025-05-23未知
4
GLM-4.5
Thinking Level · Medium
98.22025-07-283550
5
OpenAI o3
Normal
98.12025-04-16未知
6
GLM-4.5-Air
Thinking Level · Medium
98.12025-07-281060
7
Qwen3-235B-A22B
Thinking Level · Medium
982025-04-282350
8
DeepSeek-R1-0528
Thinking Level · Medium
982025-05-286710
9
OpenAI o3-mini (high)
Normal
97.92025-01-31未知
10
Claude Opus 4.6
Deep Thinking
97.62026-02-05未知
11
Qwen3-8B
Thinking Level · Medium
97.42025-04-2880
12
Kimi K2
Normal
97.42025-07-1110000
13
DeepSeek-R1
Normal
97.32025-01-206710
14
Qwen3-32B
Thinking Level · Medium
97.22025-04-28320
15
MiniMax-M1-80k
Normal
96.82025-06-164560
16
Pangu Pro MoE
Normal
96.82025-06-30719
17
OpenAI o1
Normal
96.42024-12-05未知
18
ERNIE-4.5-300B-A47B
Normal
96.42025-06-303000
19
Kimi k1.5 (Long-CoT)
Normal
96.22025-01-22未知
20
Claude Sonnet 3.7-64K Extended Thinking
Normal
96.22025-02-25未知
21
Hunyuan-T1
Normal
96.22025-03-21未知
22
Qwen3-235B-A22B
Normal
96.22025-04-282350
23
MiniMax-M1-40k
Normal
962025-06-164560
24
OpenAI o3-mini
Thinking Level · Medium
95.82025-01-31未知
25
Llama 4 Behemoth Instruct
Normal
952025-04-0520000
26
Kimi k1.5 (Short-CoT)
Normal
94.62025-01-22未知
27
DeepSeek-R1-Distill-Llama-70B
Normal
94.52025-01-20700
28
DeepSeek-V3-0324
Normal
942025-03-246710
29
Hunyuan-7B
Normal
93.72025-08-0470
30
GPT-4.1
Normal
92.82025-04-14未知
滚动或悬停加载剩余 13 条