Gemini 2.5 Pro Experimental 03-25 Benchmark Details

Gemini 2.5 Pro Experimental 03-25 currently shows benchmark results led by AIME 2024 (9 / 62, score 92), Aider-Polyglot (12 / 59, score 72.90), SimpleQA (12 / 45, score 52.90).

Benchmark Results

Gemini 2.5 Pro Experimental 03-25

Benchmark Results

Thinking
Tool usage

General Knowledge

2 evaluations
Benchmark / mode
Score
Rank/total
84
55 / 179
18.80
110 / 159

Common Sense

1 evaluations
Benchmark / mode
Score
Rank/total
52.90
12 / 45

Coding and Software Engineer

2 evaluations
Benchmark / mode
Score
Rank/total
70.40
53 / 120
63.80
72 / 108

Math and Reasoning

3 evaluations
Benchmark / mode
Score
Rank/total
92
9 / 62
86.90
46 / 106
4.20
40 / 80

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Standard Mode
51.60
27 / 63

Agent Level Benchmark

1 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Standard Mode
72.90
12 / 59

Claw-style Agent Evaluation

2 evaluations
Benchmark / mode
Score
Rank/total
Claw Bench
Thinking EnabledTools
80.40
20 / 29
Pinch Bench
Thinking EnabledTools
71.90
29 / 37