Kimi K2.6 Benchmark Details

Kimi K2.6 currently shows benchmark results led by LiveCodeBench (7 / 123, score 89.60), HLE (15 / 172, score 54), GPQA Diamond (18 / 187, score 90.50). This page also compares it with 3 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Kimi K2.6

Benchmark Results

General Knowledge

4 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

Thinking Enabled

90.50

18 / 187

LiveBench

Thinking Enabled

72.17

28 / 115

HLE

Thinking Enabled

34.70

76 / 172

HLE

Thinking EnabledToolsInternet

15 / 172

Coding and Software Engineer

4 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Thinking Enabled

89.60

7 / 123

SWE-bench Verified

Thinking EnabledTools

80.20

14 / 112

SWE-bench Multilingual

Thinking EnabledTools

76.70

5 / 23

SWE-Bench Pro - Public

Thinking EnabledTools

58.60

13 / 54

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

Thinking EnabledToolsInternet

83.20

14 / 53

AI Agent - Tool Usage

4 evaluations

Benchmark / mode

Score

Rank/total

OSWorld-Verified

Thinking EnabledTools

73.10

14 / 24

Terminal Bench 2.0

Thinking EnabledTools

66.70

10 / 47

TerminalBench 2.1

Thinking Enabled

53.56

27 / 27

Tool Decathlon

Thinking EnabledTools

2 / 9

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

AIME 2026

Thinking Enabled

96.40

3 / 18

IMO-AnswerBench

Thinking Enabled

8 / 21

Claw-style Agent Evaluation

1 evaluations

Benchmark / mode

Score

Rank/total

Claw Bench

Thinking EnabledTools

80.90

19 / 29

Compare with other models

Competitor Comparison

Benchmark scores for Kimi K2.6 compared against top models in its class

Kimi K2.6Qwen3.6-Max-Preview MiniMax-M2.7 GLM 5.1

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Kimi K2.6Current	Qwen3.6-Max-Preview	MiniMax-M2.7	GLM 5.1
GPQA Diamond 综合评估	90.50Thinking Enabled	90.40Thinking Level · High	87.00Thinking Enabled	86.20Thinking Enabled
HLE 综合评估	54.00Thinking Enabled ｜ Tools	50.20Thinking Enabled ｜ Tools	28.00Thinking Enabled	52.30Thinking Enabled ｜ Tools
LiveBench 综合评估	72.17Thinking Enabled	--	63.49Deep Thinking Mode	70.18Standard Mode
LiveCodeBench 编程与软件工程	89.60Thinking Enabled	87.10Thinking Level · High	--	--
SWE-bench Multilingual 编程与软件工程	76.70Thinking Enabled ｜ Tools	73.80Thinking Enabled ｜ Tools	--	--
SWE-Bench Pro - Public 编程与软件工程	58.60Thinking Enabled ｜ Tools	57.30Deep Thinking Mode ｜ Tools	56.20Thinking Enabled ｜ Tools	58.40Thinking Enabled ｜ Tools
SWE-bench Verified 编程与软件工程	80.20Thinking Enabled ｜ Tools	78.80Thinking Enabled ｜ Tools	--	--
BrowseComp AI Agent - 信息收集	83.20Thinking Enabled ｜ Tools	--	--	79.30Thinking Enabled ｜ Tools
Terminal Bench 2.0 AI Agent - 工具使用	66.70Thinking Enabled ｜ Tools	65.40Deep Thinking Mode ｜ Tools	--	63.50Thinking Enabled ｜ Tools
TerminalBench 2.1 AI Agent - 工具使用	53.56Thinking Enabled	--	--	58.70Thinking Level · High ｜ Tools
Tool Decathlon AI Agent - 工具使用	50.00Thinking Enabled ｜ Tools	--	--	40.70Thinking Enabled ｜ Tools
AIME 2026 数学推理	96.40Thinking Enabled	--	--	95.30Thinking Enabled

2 additional benchmarks remain in the chart above.

Standard API Pricing: Kimi K2.6 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

Qwen3.6-Max-Preview: Base price applies to <= 128

Model	Supplier	Standard input	Standard output	Base price applies to
Kimi K2.6	Facebook AI研究实验室	$0.95 / 1M tokens	$4 / 1M tokens	—
Qwen3.6-Max-Preview	阿里巴巴	$1.3 / 1M tokens	$7.8 / 1M tokens	<= 128
MiniMax-M2.7	MiniMaxAI	$0.3 / 1M tokens	$1.2 / 1M tokens	—
GLM 5.1	智谱AI	$1.4 / 1M tokens	$4.4 / 1M tokens	—

Version History

How each version of the Kimi K2.6 series stacks up on benchmark tests

Kimi K2.6Kimi K2.5 Kimi K2 Thinking Kimi K2

Benchmark categories:

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Kimi K2.6Current	Kimi K2.5	Kimi K2 Thinking	Kimi K2
GPQA Diamond 综合评估	90.50Thinking Enabled	87.60Thinking Enabled	84.50Thinking Enabled	75.10Standard Mode
HLE 综合评估	54.00Thinking Enabled ｜ Tools	50.20Thinking Enabled ｜ Tools	51.00Thinking Enabled ｜ Tools	4.70Standard Mode
LiveBench 综合评估	72.17Thinking Enabled	69.07Thinking Enabled	61.59Thinking Enabled	48.10Standard Mode
LiveCodeBench 编程与软件工程	89.60Thinking Enabled	85.00Thinking Enabled	83.10Thinking Enabled	53.70Standard Mode
SWE-bench Multilingual 编程与软件工程	76.70Thinking Enabled ｜ Tools	73.00Thinking Enabled	--	--
SWE-Bench Pro - Public 编程与软件工程	58.60Thinking Enabled ｜ Tools	50.70Thinking Enabled ｜ Tools	--	--
SWE-bench Verified 编程与软件工程	80.20Thinking Enabled ｜ Tools	76.80Thinking Enabled ｜ Tools	71.30Thinking Enabled ｜ Tools	51.80Standard Mode
BrowseComp AI Agent - 信息收集	83.20Thinking Enabled ｜ Tools	60.60Thinking Enabled ｜ Tools	60.20Thinking Enabled ｜ Tools	--
Terminal Bench 2.0 AI Agent - 工具使用	66.70Thinking Enabled ｜ Tools	50.80Thinking Enabled ｜ Tools	--	--
AIME 2026 数学推理	96.40Thinking Enabled	92.50Thinking Enabled	--	--
IMO-AnswerBench 数学推理	86.00Thinking Enabled	81.80Thinking Enabled	--	--
Claw Bench OpenClaw智能体能力综合测评	80.90Thinking Enabled ｜ Tools	81.70Thinking Enabled ｜ Tools	82.50Thinking Enabled ｜ Tools	--

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Kimi K2.6 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Kimi K2.6	Facebook AI研究实验室	$0.95 / 1M tokens	$4 / 1M tokens	—
Kimi K2.5	Moonshot AI	$0.6 / 1M tokens	$3 / 1M tokens	—
Kimi K2 Thinking	Fireworks AI	$0.6 / 1M tokens	$2.5 / 1M tokens	—
Kimi K2	Moonshot AI	$0.6 / 1M tokens	$2.5 / 1M tokens	—

Sources

kimi.comkimi.com