Kimi K2.5 Benchmark Details
Kimi K2.5 currently shows benchmark results led by HLE (17 / 149, score 50.20), LiveCodeBench (14 / 118, score 85), GPQA Diamond (31 / 175, score 87.60). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.
Benchmark Results
Benchmark Results
综合评估
6 evaluations编程与软件工程
4 evaluations数学推理
3 evaluationsOpenClaw智能体能力综合测评
2 evaluationsCompetitor Comparison
Benchmark scores for Kimi K2.5 compared against top models in its class
Benchmark Score Comparison
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | Kimi K2.5Current | GLM-5 | MiniMax M2.5 |
|---|---|---|---|
ARC-AGI 综合评估 | 65.30Thinking Enabled | 44.70Thinking Enabled | 63.70Thinking Enabled |
ARC-AGI-2 综合评估 | 11.80Thinking Enabled | 4.90Thinking Enabled | 4.90Thinking Enabled |
GPQA Diamond 综合评估 | 87.60Thinking Enabled | 86.00Thinking Enabled | 85.20Thinking Enabled |
HLE 综合评估 | 50.20Thinking Enabled | Tools | 50.40Thinking Enabled | Tools | 19.40Thinking Enabled |
SWE-Bench Pro - Public 编程与软件工程 | 50.70Thinking Enabled | Tools | -- | 55.40Thinking Enabled | Tools |
SWE-bench Verified 编程与软件工程 | 76.80Thinking Enabled | Tools | 77.80Thinking Enabled | 80.20Thinking Enabled | Tools |
AIME 2026 数学推理 | 92.50Thinking Enabled | 92.70Thinking Enabled | -- |
AIME2025 数学推理 | 96.10Thinking Enabled | -- | 86.30Thinking Enabled |
4.20Standard Mode | 2.10Standard Mode | -- | |
IMO-AnswerBench 数学推理 | 81.80Thinking Enabled | 82.50Thinking Enabled | -- |
BrowseComp AI Agent - 信息收集 | 60.60Thinking Enabled | Tools | 75.90Thinking Enabled | Tools | 76.30Thinking Enabled | Tools |
Terminal Bench 2.0 AI Agent - 工具使用 | 50.80Thinking Enabled | Tools | 61.10Thinking Enabled | Tools | 51.70Thinking Enabled | Tools |
Standard API Pricing: Kimi K2.5 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Kimi K2.5 | — | 0.6 美元/100 万tokens | 3 美元/100 万tokens | — |
GLM-5 | 智谱AI | $1 / 1M tokens | $3.2 / 1M tokens | — |
MiniMax M2.5 | MiniMaxAI | $0.3 / 1M tokens | $2.4 / 1M tokens | — |
Version History
How each version of the Kimi K2.5 series stacks up on benchmark tests
Benchmark Score Comparison
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | Kimi K2.5Current | Kimi K2 Thinking | Kimi K2 0905 | Kimi K2 |
|---|---|---|---|---|
ARC-AGI 综合评估 | 65.30Thinking Enabled | -- | -- | 13.30Standard Mode |
GPQA Diamond 综合评估 | 87.60Thinking Enabled | 84.50Thinking Enabled | -- | 75.10Standard Mode |
HLE 综合评估 | 50.20Thinking Enabled | Tools | 51.00Thinking Enabled | Tools | 21.70Thinking Enabled | Tools | 4.70Standard Mode |
MMLU Pro 综合评估 | 78.50Thinking Enabled | 84.60Thinking Enabled | -- | 81.10Standard Mode |
LiveCodeBench 编程与软件工程 | 85.00Thinking Enabled | 83.10Thinking Enabled | -- | 53.70Standard Mode |
SWE-Bench Pro - Public 编程与软件工程 | 50.70Thinking Enabled | Tools | -- | 27.67Standard Mode | -- |
SWE-bench Verified 编程与软件工程 | 76.80Thinking Enabled | Tools | 71.30Thinking Enabled | Tools | 69.20Standard Mode | 51.80Standard Mode |
AIME2025 数学推理 | 96.10Thinking Enabled | 100.00Thinking Enabled | Tools | 75.20Thinking Enabled | Tools | 54.00Standard Mode |
4.20Standard Mode | 0.00Thinking Enabled | -- | 0.01Standard Mode | |
Simple Bench 常识推理 | 46.80Thinking Enabled | -- | -- | 26.30Standard Mode |
BrowseComp AI Agent - 信息收集 | 60.60Thinking Enabled | Tools | 60.20Thinking Enabled | Tools | -- | -- |
Claw Bench OpenClaw智能体能力综合测评 | 81.70Thinking Enabled | Tools | 82.50Thinking Enabled | Tools | -- | -- |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the Kimi K2.5 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Kimi K2.5 | — | 0.6 美元/100 万tokens | 3 美元/100 万tokens | — |
Kimi K2 Thinking | — | 0.6 美元/100 万tokens | 2.5 美元/100 万tokens | — |
Kimi K2 0905 | — | 0.60 美元/ 100 万tokens | 2.5 美元/ 100 万tokens | — |
Kimi K2 | — | 0.6 美元/100 万tokens | 2.5 美元/100 万tokens | — |