Claude Sonnet 4.5vsGemini 2.5-Pro
Across 23 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 14, Gemini 2.5-Pro wins 7, with 2 ties and an average score difference of +6.21.
Claude Sonnet 4.5
Anthropic · 2025-09-30 · Chat model
Gemini 2.5-Pro
Google Deep Mind · 2025-06-05 · Reasoning model
Claude Sonnet 4.514 wins(61%)Ties2(30%)7 winsGemini 2.5-Pro
Benchmark scores
Grouped by capability, sorted by largest gap within each. 23 shared benchmarks.
General Knowledge
Claude Sonnet 4.5 5/6| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| ARC-AGI | 63.7032 / 65 | 3747 / 65 | +26.70 |
| HLE | 33.6067 / 157 | 21.6097 / 157 | +12 |
| ARC-AGI-2 | 13.6035 / 59 | 4.9044 / 59 | +8.70 |
| LiveBench | 78.264 / 52 | 71.9213 / 52 | +6.34 |
| GPQA Diamond | 83.4058 / 178 | 86.4041 / 178 | -3 |
| MMLU Pro | 887 / 126 | 8620 / 126 | +2 |
Math and Reasoning
Gemini 2.5-Pro 4/6| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| IMO-ProofBench | 27.108 / 16 | 55.203 / 16 | -28.10 |
| IMO-ProofBench Advanced | 4.806 / 8 | 17.604 / 8 | -12.80 |
| AIME2025 | 1001 / 106 | 8843 / 106 | +12 |
| Simple Bench | 54.309 / 27 | 62.402 / 27 | -8.10 |
| FrontierMath | 5.2038 / 60 | 1123 / 60 | -5.80 |
| FrontierMath - Tier 4 | 2.1056 / 80Normal (No Tools) | 2.1056 / 80Normal (No Tools) | — |
Agent Level Benchmark
Claude Sonnet 4.5 2/2| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| τ²-Bench - Telecom | 985 / 35 | 5432 / 35 | +44 |
| Terminal Bench Hard | 338 / 13 | 2512 / 13 | +8 |
AI Agent - Tool Usage
Claude Sonnet 4.5 2/2| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| Terminal-Bench | 503 / 35 | 25.3028 / 35 | +24.70 |
| Terminal Bench 2.0 | 42.8041 / 46 | 32.6046 / 46 | +10.20 |
Coding and Software Engineer
Even 2/2| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| SWE-bench Verified | 826 / 108 | 67.2068 / 108 | +14.80 |
| LiveCodeBench | 7147 / 120 | 77.1034 / 120 | -6.10 |
AI Agent - Information Search
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| BrowseComp | 24.1043 / 45 | 7.8044 / 45 | +16.30 |
Instruction Following
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| IF Bench | 57.3021 / 29 | 4928 / 29 | +8.30 |
Long Context
Even 1/1| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| AA-LCR | 668 / 13 | 668 / 13 | — |
Multimodal Understanding
Gemini 2.5-Pro 1/1| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| MMMU | 77.8014 / 28 | 829 / 28 | -4.20 |
Productivity Knowledge
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Gemini 2.5-Pro | Diff |
|---|---|---|---|
| GDPval-AA | 3916 / 21 | 2221 / 21 | +17 |
Specs
| Field | Claude Sonnet 4.5 | Gemini 2.5-Pro |
|---|---|---|
| Publisher | Anthropic | Google Deep Mind |
| Release date | 2025-09-30 | 2025-06-05 |
| Model type | Chat model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 1000K | 1000K |
| Max output | 64K | 64K |
Summary
- Claude Sonnet 4.5leads in:General Knowledge (5/6), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), AI Agent - Information Search (1/1), Instruction Following (1/1), Productivity Knowledge (1/1)
- Gemini 2.5-Proleads in:Math and Reasoning (4/6), Multimodal Understanding (1/1)
- Tied in:Coding and Software Engineer, Long Context
On average across the 23 shared benchmarks, Claude Sonnet 4.5 scores 6.21 higher.
Largest single-benchmark gap: τ²-Bench - Telecom — Claude Sonnet 4.5 98 vs Gemini 2.5-Pro 54 (+44).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.