Kimi K2.5 Benchmark Details

Kimi K2.5 currently shows benchmark results led by LiveCodeBench (16 / 123, score 85), HLE (27 / 172, score 50.20), AIME2025 (21 / 107, score 96.10). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Kimi K2.5

Benchmark Results

General Knowledge

7 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

Thinking Enabled

87.60

37 / 187

MMLU Pro

Thinking Enabled

78.50

69 / 132

LiveBench

Thinking Enabled

69.07

42 / 115

ARC-AGI

Thinking Enabled

65.30

34 / 68

HLE

Thinking Enabled

30.10

89 / 172

HLE

Thinking EnabledTools

50.20

27 / 172

ARC-AGI-2

Thinking Enabled

11.80

39 / 62

Coding and Software Engineer

4 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Thinking Enabled

16 / 123

SWE-bench Verified

Thinking EnabledTools

76.80

30 / 112

SWE-bench Multilingual

Thinking Enabled

13 / 23

SWE-Bench Pro - Public

Thinking EnabledTools

50.70

41 / 54

Math and Reasoning

4 evaluations

Benchmark / mode

Score

Rank/total

AIME2025

Thinking Enabled

96.10

21 / 107

AIME 2026

Thinking Enabled

92.50

12 / 18

IMO-AnswerBench

Thinking Enabled

81.80

16 / 21

FrontierMath - Tier 4

Standard Mode

4.20

40 / 80

Common Sense Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Thinking Enabled

46.80

30 / 63

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

Thinking EnabledToolsInternet

60.60

36 / 53

AI Agent - Tool Usage

2 evaluations

Benchmark / mode

Score

Rank/total

MCP-Atlas

Standard ModeTools

64.40

19 / 27

Terminal Bench 2.0

Thinking EnabledTools

50.80

34 / 47

Productivity Knowledge

1 evaluations

Benchmark / mode

Score

Rank/total

GDPval-AA

Thinking Enabled

15 / 21

Long Context

2 evaluations

Benchmark / mode

Score

Rank/total

AA-LCR

Thinking Enabled

12 / 15

LongBench v2

Standard Mode

5 / 11

Claw-style Agent Evaluation

2 evaluations

Benchmark / mode

Score

Rank/total

Pinch Bench

Thinking EnabledTools

84.80

17 / 37

Claw Bench

Thinking EnabledTools

81.70

18 / 29

Compare with other models

Competitor Comparison

Benchmark scores for Kimi K2.5 compared against top models in its class

Kimi K2.5GLM-5 MiniMax M2.5

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Kimi K2.5Current	GLM-5	MiniMax M2.5
ARC-AGI 综合评估	65.30Thinking Enabled	44.70Thinking Enabled	63.70Thinking Enabled
ARC-AGI-2 综合评估	11.80Thinking Enabled	4.90Thinking Enabled	4.90Thinking Enabled
GPQA Diamond 综合评估	87.60Thinking Enabled	86.00Thinking Enabled	85.20Thinking Enabled
HLE 综合评估	50.20Thinking Enabled ｜ Tools	50.40Thinking Enabled ｜ Tools	19.40Thinking Enabled
LiveBench 综合评估	69.07Thinking Enabled	68.85Standard Mode	60.14Deep Thinking Mode
SWE-Bench Pro - Public 编程与软件工程	50.70Thinking Enabled ｜ Tools	--	55.40Thinking Enabled ｜ Tools
SWE-bench Verified 编程与软件工程	76.80Thinking Enabled ｜ Tools	77.80Thinking Enabled	80.20Thinking Enabled ｜ Tools
AIME 2026 数学推理	92.50Thinking Enabled	92.70Thinking Enabled	--
AIME2025 数学推理	96.10Thinking Enabled	--	86.30Thinking Enabled
FrontierMath - Tier 4 数学推理	4.20Standard Mode	2.10Standard Mode	--
IMO-AnswerBench 数学推理	81.80Thinking Enabled	82.50Thinking Enabled	--
Simple Bench 常识推理	46.80Thinking Enabled	53.20Standard Mode	--

7 additional benchmarks remain in the chart above.

Standard API Pricing: Kimi K2.5 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Kimi K2.5	Moonshot AI	$0.6 / 1M tokens	$3 / 1M tokens	—
GLM-5	智谱AI	$1 / 1M tokens	$3.2 / 1M tokens	—
MiniMax M2.5	MiniMaxAI	$0.3 / 1M tokens	$2.4 / 1M tokens	—

Version History

How each version of the Kimi K2.5 series stacks up on benchmark tests

Kimi K2.5Kimi K2 Thinking Kimi K2 0905 Kimi K2

Benchmark categories:

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Kimi K2.5Current	Kimi K2 Thinking	Kimi K2 0905	Kimi K2
ARC-AGI 综合评估	65.30Thinking Enabled	--	--	13.30Standard Mode
GPQA Diamond 综合评估	87.60Thinking Enabled	84.50Thinking Enabled	--	75.10Standard Mode
HLE 综合评估	50.20Thinking Enabled ｜ Tools	51.00Thinking Enabled ｜ Tools	21.70Thinking Enabled ｜ Tools	4.70Standard Mode
LiveBench 综合评估	69.07Thinking Enabled	61.59Thinking Enabled	--	48.10Standard Mode
MMLU Pro 综合评估	78.50Thinking Enabled	84.60Thinking Enabled	--	81.10Standard Mode
LiveCodeBench 编程与软件工程	85.00Thinking Enabled	83.10Thinking Enabled	--	53.70Standard Mode
SWE-Bench Pro - Public 编程与软件工程	50.70Thinking Enabled ｜ Tools	--	27.67Standard Mode	--
SWE-bench Verified 编程与软件工程	76.80Thinking Enabled ｜ Tools	71.30Thinking Enabled ｜ Tools	69.20Standard Mode	51.80Standard Mode
AIME2025 数学推理	96.10Thinking Enabled	100.00Thinking Enabled ｜ Tools	75.20Thinking Enabled ｜ Tools	54.00Standard Mode
FrontierMath - Tier 4 数学推理	4.20Standard Mode	0.00Thinking Enabled	--	0.01Standard Mode
Simple Bench 常识推理	46.80Thinking Enabled	39.60Standard Mode	--	26.30Standard Mode
BrowseComp AI Agent - 信息收集	60.60Thinking Enabled ｜ Tools	60.20Thinking Enabled ｜ Tools	--	--

1 additional benchmarks remain in the chart above.

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Kimi K2.5 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Kimi K2.5	Moonshot AI	$0.6 / 1M tokens	$3 / 1M tokens	—
Kimi K2 Thinking	Fireworks AI	$0.6 / 1M tokens	$2.5 / 1M tokens	—
Kimi K2 0905	Fireworks AI	$0.6 / 1M tokens	$2.5 / 1M tokens	—
Kimi K2	Moonshot AI	$0.6 / 1M tokens	$2.5 / 1M tokens	—

Sources

kimi.comkimi.com