DataLearnerAI
Toggle menu
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tool Directory
Search blog
中
EN
加载中...
Gemini 2.5-Pro Benchmark Analysis | DataLearnerAI
Model catalog
Gemini 2.5-Pro
Benchmark analysis
Gemini 2.5-Pro Benchmark Analysis
Google Deep Mind
Updated 2/25/2026
6 views
Share
In-depth Analysis
Gemini 2.5 Pro是谷歌发布的Gemini 2.5系列模型中最强的一个。
Benchmark Results
Gemini 2.5-Pro
Benchmark Results
Thinking
All modes
Normal
Thinking
Tool usage
All tools
With tools
No tools
综合评估
6 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Thinking
86.40
21 / 153
MMLU Pro
Normal
86
11 / 112
LiveBench
Thinking
71.92
13 / 52
ARC-AGI
Thinking
37
27 / 42
HLE
Thinking
21.60
50 / 105
ARC-AGI-2
Thinking
4.90
25 / 34
常识问答
1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQA
Normal
54
9 / 44
编程与软件工程
2 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
Normal
77.10
21 / 103
SWE-bench Verified
Thinking
67.20
50 / 87
数学推理
9 evaluations
Benchmark / mode
Score
Rank/total
MATH-500
Normal
98.80
1 / 42
AIME 2024
Normal
92
9 / 62
AIME2025
Thinking
88
41 / 105
IMO-ProofBench
Thinking
55.20
3 / 16
IMO 2024
Thinking
19
2 / 10
IMO-ProofBench Advanced
Thinking
17.60
4 / 8
IMO 2025
Thinking
15.20
3 / 9
FrontierMath
Normal
11
15 / 52
FrontierMath - Tier 4
Normal
4.20
12 / 32
写作和创作
1 evaluations
Benchmark / mode
Score
Rank/total
Creative Writing
Normal
85.85
8 / 22
AI Agent - 工具使用
2 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
Thinking + With tools
32.60
20 / 20
Terminal-Bench
Thinking
25.30
28 / 35
多模态理解
1 evaluations
Benchmark / mode
Score
Rank/total
MMMU
Thinking
82
5 / 17
常识推理
1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Thinking
62.40
2 / 27
Agent能力评测
3 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Thinking
83.10
2 / 26
τ²-Bench - Telecom
Thinking + With tools
54
26 / 29
Terminal Bench Hard
Thinking + With tools
25
13 / 14
指令跟随
1 evaluations
Benchmark / mode
Score
Rank/total
IF Bench
Thinking + With tools
49
23 / 25
AI Agent - 信息收集
1 evaluations
Benchmark / mode
Score
Rank/total
BrowseComp
Thinking + With tools
7.80
26 / 27
生产力知识
1 evaluations
Benchmark / mode
Score
Rank/total
GDPval-AA
Thinking
22
11 / 11
长上下文能力
1 evaluations
Benchmark / mode
Score
Rank/total
AA-LCR
Thinking
66
6 / 12
与其他模型对比
References
kaggle.com
artificialanalysis.ai