Benchmark Results
Benchmark Results
综合评估
12 evaluations编程与软件工程
5 evaluations数学推理
5 evaluationsAI Agent - 工具使用
2 evaluationsOpenClaw智能体能力综合测评
2 evaluationsCompetitor Comparison
Benchmark scores for Haiku 4.5 compared against top models in its class
10 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | Haiku 4.5Current | GPT-5.4 mini | Gemini 3.0 Flash |
|---|---|---|---|
ARC-AGI-2 综合评估 | 4.50Extended Thinking | -- | 33.60Thinking Enabled |
GPQA Diamond 综合评估 | 73.30Extended Thinking | 88.00Thinking Level · Extra High | 90.40Thinking Enabled |
HLE 综合评估 | 9.70Extended Thinking | 41.50Thinking Level · Extra High | Tools | 43.50Thinking Enabled | Tools |
SWE-Bench Pro - Public 编程与软件工程 | 39.45Extended Thinking | Tools | 54.40Thinking Level · Extra High | Tools | 49.60Thinking Level · High | Tools |
SWE-bench Verified 编程与软件工程 | 73.30128K | Tools | -- | 68.70Thinking Enabled |
AIME2025 数学推理 | 96.30128K | Tools | -- | 99.70Thinking Enabled | Tools |
2.1032K | 2.10Thinking Level · High | 4.20Standard Mode | |
τ²-Bench Agent能力评测 | 33.00Standard Mode | Tools | -- | 90.20Thinking Enabled | Tools |
Claw Bench OpenClaw智能体能力综合测评 | 89.40Thinking Enabled | Tools | 75.30Thinking Enabled | Tools | 85.70Thinking Enabled | Tools |
Pinch Bench OpenClaw智能体能力综合测评 | 82.00Thinking Enabled | Tools | -- | 85.20Thinking Enabled | Tools |
Standard API Pricing: Haiku 4.5 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Haiku 4.5 | — | 1 美元 / 100万 tokens | 5 美元 / 100万 tokens | — |
GPT-5.4 mini | OpenAI | $0.75 / 1M tokens | $4.5 / 1M tokens | — |
Gemini 3.0 Flash | — | 0.5 美元/100万 tokens | 3 美元/100万 tokens | — |
Version History
How each version of the Haiku 4.5 series stacks up on benchmark tests
3 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | Haiku 4.5Current | Claude 3.5 Haiku |
|---|---|---|
GPQA Diamond 综合评估 | 73.30Extended Thinking | 41.60Standard Mode |
MMLU Pro 综合评估 | 80.00Extended Thinking | 65.00Standard Mode |
FrontierMath 数学推理 | 4.10Standard Mode | 0.30Standard Mode |
Single-Benchmark Version Trend
Viewing: GPQA Diamond · 综合评估
Standard API Pricing Across the Haiku 4.5 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Haiku 4.5 | — | 1 美元 / 100万 tokens | 5 美元 / 100万 tokens | — |