Sonnet 4.5是Anthropic一个中等能力的模型,但很多评测结果不比Opus差。
Benchmark Results
Benchmark Results
综合评估
12 evaluations编程与软件工程
5 evaluations数学推理
8 evaluationsAI Agent - 工具使用
4 evaluationsAgent能力评测
4 evaluationsOpenClaw智能体能力综合测评
2 evaluationsCompetitor Comparison
Benchmark scores for Claude Sonnet 4.5 compared against top models in its class
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | Claude Sonnet 4.5Current | GPT-5.1 | Gemini 2.5-Pro |
|---|---|---|---|
ARC-AGI 综合评估 | 63.70Thinking Enabled | 72.80Thinking Level · High | 37.00Thinking Enabled |
ARC-AGI-2 综合评估 | 13.60Thinking Enabled | 17.60Thinking Level · High | 4.90Thinking Enabled |
GPQA Diamond 综合评估 | 83.40Thinking Enabled | 88.10Thinking Enabled | 86.40Thinking Enabled |
HLE 综合评估 | 33.60Thinking Enabled | Tools | 42.70Thinking Level · High | Tools | 21.60Thinking Enabled |
LiveBench 综合评估 | 78.26Thinking Enabled | -- | 71.92Thinking Enabled |
MMLU Pro 综合评估 | 88.00Thinking Enabled | -- | 86.00Standard Mode |
LiveCodeBench 编程与软件工程 | 71.00Thinking Enabled | -- | 77.10Standard Mode |
SWE-Bench Pro - Public 编程与软件工程 | 43.60Thinking Enabled | 50.80Thinking Level · High | -- |
SWE-bench Verified 编程与软件工程 | 82.00Thinking Enabled | Tools | 76.30Thinking Level · High | 67.20Thinking Enabled |
AIME2025 数学推理 | 100.00Thinking Enabled | Tools | 94.00Thinking Level · High | 88.00Thinking Enabled |
FrontierMath 数学推理 | 5.20Standard Mode | 26.70Thinking Level · High | Tools | 11.00Standard Mode |
4.2032K | 12.50Thinking Level · High | Tools | 2.10Standard Mode |
Standard API Pricing: Claude Sonnet 4.5 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Claude Sonnet 4.5 | — | 3 美元/100 万tokens | 15 美元/100 万tokens | <= 200K |
GPT-5.1 | — | 1.25 美元/100万 tokens | 10 美元/100万 tokens | — |
Gemini 2.5-Pro | — | 1.25 美元/100 万tokens | 10 美元/100 万tokens | <= 200K |
Version History
How each version of the Claude Sonnet 4.5 series stacks up on benchmark tests
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | Claude Sonnet 4.5Current | Claude Sonnet 4 | Claude Sonnet 3.7 | Claude 3.5 Sonnet New | Claude 3.5 Sonnet |
|---|---|---|---|---|---|
ARC-AGI 综合评估 | 63.70Thinking Enabled | 40.00Thinking Enabled | -- | -- | -- |
ARC-AGI-2 综合评估 | 13.60Thinking Enabled | 5.90Thinking Enabled | -- | -- | -- |
GPQA Diamond 综合评估 | 83.40Thinking Enabled | 83.80Deep Thinking Mode | Tools | 77.00Thinking Enabled | 65.00Standard Mode | 59.40Standard Mode |
HLE 综合评估 | 33.60Thinking Enabled | Tools | 9.60Thinking Enabled | 10.30Thinking Enabled | -- | -- |
LiveBench 综合评估 | 78.26Thinking Enabled | 73.82Thinking Enabled | 68.64Thinking Enabled | -- | -- |
MMLU Pro 综合评估 | 88.00Thinking Enabled | 84.00Thinking Enabled | -- | 78.00Standard Mode | 77.64Standard Mode |
LiveCodeBench 编程与软件工程 | 71.00Thinking Enabled | 66.00Thinking Enabled | -- | 38.70Standard Mode | -- |
SWE-Bench Pro - Public 编程与软件工程 | 43.60Thinking Enabled | 42.70Thinking Enabled | -- | -- | -- |
SWE-bench Verified 编程与软件工程 | 82.00Thinking Enabled | Tools | 80.20Thinking Enabled | Tools | 70.30Thinking Enabled | Tools | 49.00Standard Mode | -- |
AIME2025 数学推理 | 100.00Thinking Enabled | Tools | 85.00Deep Thinking Mode | Tools | 54.80Standard Mode | -- | -- |
FrontierMath 数学推理 | 5.20Standard Mode | 4.10Standard Mode | 4.10Thinking Enabled | 2.10Standard Mode | 1.00Standard Mode |
IMO-ProofBench 数学推理 | 27.10Thinking Enabled | 27.10Thinking Enabled | -- | -- | -- |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the Claude Sonnet 4.5 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Claude Sonnet 4.5 | — | 3 美元/100 万tokens | 15 美元/100 万tokens | <= 200K |
Claude Sonnet 4 | — | 3 美元/ 100万tokens | 15 美元/100万tokens | — |