Kimi K2.5vsKimi K2

Across 9 shared benchmarks, Kimi K2.5 leads overall: Kimi K2.5 wins 8, Kimi K2 wins 1, with 0 ties and an average score difference of +25.61.

Moonshot AI · 2026-01-27 · Multimodal model

Moonshot AI · 2025-07-11 · Chat model

Kimi K2.58 wins(89%)(11%)1 winKimi K2

Benchmark scores

Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.

Kimi K2.5 3/4

Benchmark	Kimi K2.5	Kimi K2	Diff
ARC-AGI	65.3031 / 65Thinking (No Tools)	13.3057 / 65	+52
HLE	50.2020 / 157Thinking (With Tools)	4.70154 / 157	+45.50
GPQA Diamond	87.6034 / 178Thinking (No Tools)	75.1093 / 178	+12.50
MMLU Pro	78.5066 / 126Thinking (No Tools)	81.1053 / 126	-2.60

Kimi K2.5 3/3

Benchmark	Kimi K2.5	Kimi K2	Diff
AIME2025	96.1021 / 106Thinking (No Tools)	5485 / 106	+42.10
Simple Bench	46.8013 / 27Thinking (No Tools)	26.3024 / 27	+20.50
FrontierMath - Tier 4	4.2040 / 80Normal (No Tools)	0.0171 / 80	+4.19

Kimi K2.5 2/2

Benchmark	Kimi K2.5	Kimi K2	Diff
LiveCodeBench	8516 / 120Thinking (No Tools)	53.7086 / 120	+31.30
SWE-bench Verified	76.8027 / 108Thinking (With Tools)	51.8088 / 108	+25

Kimi K2.5leads in:General Knowledge (3/4), Math and Reasoning (3/3), Coding and Software Engineer (2/2)

On average across the 9 shared benchmarks, Kimi K2.5 scores 25.61 higher.

Largest single-benchmark gap: ARC-AGI — Kimi K2.5 65.30 vs Kimi K2 13.30 (+52).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.