Haiku 4.5 Benchmark Details

Haiku 4.5 currently shows benchmark results led by AIME2025 (20 / 106, score 96.30), Terminal-Bench (11 / 35, score 41), Claw Bench (11 / 29, score 89.40). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available.

Benchmark Results

Haiku 4.5

Benchmark Results

General Knowledge

12 evaluations

Benchmark / mode

Score

Rank/total

MMLU Pro

Standard Mode

78 / 126

MMLU Pro

Extended Thinking

60 / 126

GPQA Diamond

Standard Mode

60.50

140 / 180

GPQA Diamond

Extended Thinking

73.30

102 / 180

LiveBench

Standard Mode

45.33

103 / 115

LiveBench

64K

61.32

64 / 115

ARC-AGI

Standard Mode

14.30

56 / 65

ARC-AGI

Extended Thinking

47.70

43 / 65

HLE

Standard Mode

4.30

162 / 164

HLE

Extended Thinking

9.70

140 / 164

ARC-AGI-2

Standard Mode

1.30

52 / 59

ARC-AGI-2

Extended Thinking

4.50

47 / 59

Coding and Software Engineer

5 evaluations

Benchmark / mode

Score

Rank/total

SWE-bench Verified

Standard ModeTools

60.60

77 / 109

SWE-bench Verified

128KTools

73.30

45 / 109

LiveCodeBench

Standard Mode

91 / 120

LiveCodeBench

Extended Thinking

67 / 120

SWE-Bench Pro - Public

Extended ThinkingTools

39.45

44 / 47

Math and Reasoning

5 evaluations

Benchmark / mode

Score

Rank/total

AIME2025

Standard Mode

94 / 106

AIME2025

128K

80.70

57 / 106

AIME2025

128KTools

96.30

20 / 106

FrontierMath

Standard Mode

4.10

41 / 60

FrontierMath - Tier 4

32K

2.10

56 / 80

AI Agent - Tool Usage

3 evaluations

Benchmark / mode

Score

Rank/total

Terminal-Bench

Standard ModeTools

26 / 35

Terminal-Bench

32KTools

11 / 35

MCP-Atlas

Standard ModeTools

40.20

25 / 25

Multimodal Understanding

1 evaluations

Benchmark / mode

Score

Rank/total

MMMU

128K

73.20

19 / 28

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench

Standard ModeTools

40 / 40

Instruction Following

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

Extended Thinking

54.30

25 / 29

Claw-style Agent Evaluation

2 evaluations

Benchmark / mode

Score

Rank/total

Claw Bench

Thinking EnabledTools

89.40

11 / 29

Pinch Bench

Thinking EnabledTools

21 / 37

Compare with other models

Competitor Comparison

Benchmark scores for Haiku 4.5 compared against top models in its class

Haiku 4.5GPT-5.4 mini Gemini 3.0 Flash

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

11 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Haiku 4.5Current	GPT-5.4 mini	Gemini 3.0 Flash
ARC-AGI-2 综合评估	4.50Extended Thinking	--	33.60Thinking Enabled
GPQA Diamond 综合评估	73.30Extended Thinking	88.00Thinking Level · Extra High	90.40Thinking Enabled
HLE 综合评估	9.70Extended Thinking	41.50Thinking Level · Extra High ｜ Tools	43.50Thinking Enabled ｜ Tools
LiveBench 综合评估	61.3264K	67.54Deep Thinking Mode	72.40Thinking Level · High
SWE-Bench Pro - Public 编程与软件工程	39.45Extended Thinking ｜ Tools	54.40Thinking Level · Extra High ｜ Tools	49.60Thinking Level · High ｜ Tools
SWE-bench Verified 编程与软件工程	73.30128K ｜ Tools	--	68.70Thinking Enabled
AIME2025 数学推理	96.30128K ｜ Tools	--	99.70Thinking Enabled ｜ Tools
FrontierMath - Tier 4 数学推理	2.1032K	2.10Thinking Level · High	4.20Standard Mode
τ²-Bench Agent能力评测	33.00Standard Mode ｜ Tools	--	90.20Thinking Enabled ｜ Tools
Claw Bench OpenClaw智能体能力综合测评	89.40Thinking Enabled ｜ Tools	75.30Thinking Enabled ｜ Tools	85.70Thinking Enabled ｜ Tools
Pinch Bench OpenClaw智能体能力综合测评	82.00Thinking Enabled ｜ Tools	--	85.20Thinking Enabled ｜ Tools

Standard API Pricing: Haiku 4.5 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Haiku 4.5	Anthropic	$1 / 1M tokens	$5 / 1M tokens	—
GPT-5.4 mini	OpenAI	$0.75 / 1M tokens	$4.5 / 1M tokens	—
Gemini 3.0 Flash	Google Deep Mind	$0.5 / 1M tokens	$3 / 1M tokens	—

Version History

How each version of the Haiku 4.5 series stacks up on benchmark tests

Haiku 4.5Claude 3.5 Haiku Claude3-Haiku

Benchmark categories:

2 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Haiku 4.5Current	Claude 3.5 Haiku
GPQA Diamond 综合评估	73.30Extended Thinking	41.60Standard Mode
MMLU Pro 综合评估	80.00Extended Thinking	65.00Standard Mode

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Haiku 4.5 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Haiku 4.5	Anthropic	$1 / 1M tokens	$5 / 1M tokens	—
Claude 3.5 Haiku	Anthropic	$0.8 / 1M tokens	$4 / 1M tokens	—
Claude3-Haiku	Anthropic	$0.25 / 1M tokens	$1.25 / 1M tokens	—