Sonnet 4.5是Anthropic一个中等能力的模型,但很多评测结果不比Opus差。
Claude Sonnet 4.5 Benchmark Analysis
Claude Sonnet 4.5 currently shows benchmark results led by AIME2025 (1 / 106, score 100), SWE-bench Verified (6 / 108, score 82), MMLU Pro (7 / 126, score 88). This page also compares it with 2 competitor models and 4 predecessor or same-series models, including performance and pricing views when available. 2 source links are attached for reference.
Benchmark Results
Benchmark Results
General Knowledge
12 evaluationsCoding and Software Engineer
6 evaluationsMath and Reasoning
8 evaluationsAI Agent - Tool Usage
5 evaluationsAgent Level Benchmark
4 evaluationsClaw-style Agent Evaluation
2 evaluationsCompetitor Comparison
Benchmark scores for Claude Sonnet 4.5 compared against top models in its class
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | Claude Sonnet 4.5Current | GPT-5.1 | Gemini 2.5-Pro |
|---|---|---|---|
ARC-AGI 综合评估 | 63.70Thinking Enabled | 72.80Thinking Level · High | 37.00Thinking Enabled |
ARC-AGI-2 综合评估 | 13.60Thinking Enabled | 17.60Thinking Level · High | 4.90Thinking Enabled |
GPQA Diamond 综合评估 | 83.40Thinking Enabled | 88.10Thinking Enabled | 86.40Thinking Enabled |
HLE 综合评估 | 33.60Thinking Enabled | Tools | 42.70Thinking Level · High | Tools | 21.60Thinking Enabled |
LiveBench 综合评估 | 68.1964K | 72.04Thinking Level · High | 58.33Thinking Level · High |
MMLU Pro 综合评估 | 88.00Thinking Enabled | -- | 86.00Standard Mode |
CodeClash 编程与软件工程 | 1389.00Standard Mode | Tools | -- | 1125.00Standard Mode | Tools |
LiveCodeBench 编程与软件工程 | 71.00Thinking Enabled | -- | 77.10Standard Mode |
SWE-Bench Pro - Public 编程与软件工程 | 43.60Thinking Enabled | 50.80Thinking Level · High | -- |
SWE-bench Verified 编程与软件工程 | 82.00Thinking Enabled | Tools | 76.30Thinking Level · High | 67.20Thinking Enabled |
AIME2025 数学推理 | 100.00Thinking Enabled | Tools | 94.00Thinking Level · High | 88.00Thinking Enabled |
FrontierMath 数学推理 | 5.20Standard Mode | 26.70Thinking Level · High | Tools | 11.00Standard Mode |
Standard API Pricing: Claude Sonnet 4.5 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
Version History
How each version of the Claude Sonnet 4.5 series stacks up on benchmark tests
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | Claude Sonnet 4.5Current | Claude Sonnet 4 | Claude Sonnet 3.7 | Claude 3.5 Sonnet New | Claude 3.5 Sonnet |
|---|---|---|---|---|---|
ARC-AGI 综合评估 | 63.70Thinking Enabled | 40.00Thinking Enabled | -- | -- | -- |
ARC-AGI-2 综合评估 | 13.60Thinking Enabled | 5.90Thinking Enabled | -- | -- | -- |
GPQA Diamond 综合评估 | 83.40Thinking Enabled | 83.80Deep Thinking Mode | Tools | 77.00Thinking Enabled | 65.00Standard Mode | 59.40Standard Mode |
HLE 综合评估 | 33.60Thinking Enabled | Tools | 9.60Thinking Enabled | 10.30Thinking Enabled | -- | -- |
LiveBench 综合评估 | 68.1964K | 61.2764K | -- | -- | -- |
MMLU Pro 综合评估 | 88.00Thinking Enabled | 84.00Thinking Enabled | -- | 78.00Standard Mode | 77.64Standard Mode |
CodeClash 编程与软件工程 | 1389.00Standard Mode | Tools | 1223.00Standard Mode | Tools | -- | -- | -- |
LiveCodeBench 编程与软件工程 | 71.00Thinking Enabled | 66.00Thinking Enabled | -- | 38.70Standard Mode | -- |
SWE-Bench Pro - Public 编程与软件工程 | 43.60Thinking Enabled | 42.70Thinking Enabled | -- | -- | -- |
SWE-bench Verified 编程与软件工程 | 82.00Thinking Enabled | Tools | 80.20Thinking Enabled | Tools | 70.30Thinking Enabled | Tools | 49.00Standard Mode | -- |
AIME2025 数学推理 | 100.00Thinking Enabled | Tools | 85.00Deep Thinking Mode | Tools | 54.80Standard Mode | -- | -- |
FrontierMath 数学推理 | 5.20Standard Mode | 4.10Standard Mode | 4.10Thinking Enabled | 2.10Standard Mode | 1.00Standard Mode |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the Claude Sonnet 4.5 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.