Gemini 2.5-Pro Benchmark Details

Gemini 2.5-Pro currently shows benchmark results led by MATH-500 (1 / 44, score 98.80), Aider-Polyglot (4 / 59, score 83.10), AIME 2024 (9 / 62, score 92). 2 source links are attached for reference.

Benchmark Results

Gemini 2.5-Pro

Benchmark Results

General Knowledge

6 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

86.40

45 / 187

MMLU Pro

21 / 132

LiveBench

Thinking Level · High

58.33

76 / 115

ARC-AGI

49 / 67

HLE

21.60

110 / 170

ARC-AGI-2

4.90

46 / 61

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

11 / 47

Coding and Software Engineer

3 evaluations

Benchmark / mode

Score

Rank/total

CodeClash

Standard ModeTools

1125

6 / 8

LiveCodeBench

77.10

34 / 123

SWE-bench Verified

67.20

71 / 111

Math and Reasoning

9 evaluations

Benchmark / mode

Score

Rank/total

MATH-500

98.80

1 / 44

AIME 2024

9 / 62

AIME2025

44 / 107

IMO-ProofBench

55.20

3 / 16

IMO 2024

2 / 10

IMO-ProofBench Advanced

17.60

4 / 8

IMO 2025

15.20

3 / 9

FrontierMath

23 / 60

FrontierMath - Tier 4

Standard Mode

2.10

56 / 80

Writing and Creative Capabilities

1 evaluations

Benchmark / mode

Score

Rank/total

Creative Writing

85.85

8 / 23

AI Agent - Tool Usage

2 evaluations

Benchmark / mode

Score

Rank/total

Terminal Bench 2.0

32.60

47 / 47

Terminal-Bench

25.30

28 / 35

Multimodal Understanding

1 evaluations

Benchmark / mode

Score

Rank/total

MMMU

10 / 29

Common Sense Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Thinking Enabled

62.40

11 / 63

Agent Level Benchmark

4 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

32K

83.10

4 / 59

Aider-Polyglot

Thinking Enabled

79.10

8 / 59

τ²-Bench - Telecom

32 / 35

Terminal Bench Hard

12 / 13

Instruction Following

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

29 / 30

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

7.80

51 / 52

Productivity Knowledge

1 evaluations

Benchmark / mode

Score

Rank/total

GDPval-AA

21 / 21

Long Context

1 evaluations

Benchmark / mode

Score

Rank/total

AA-LCR

9 / 14

Compare with other models

Sources

kaggle.comkaggle.com artificialanalysis.aiartificialanalysis.ai