Kimi K2.5vsKimi K2

Across 9 shared benchmarks, Kimi K2.5 leads overall: Kimi K2.5 wins 8, Kimi K2 wins 1, with 0 ties and an average score difference of +25.61.

Moonshot AI
Kimi K2.5

Moonshot AI · 2026-01-27 · Multimodal model

Moonshot AI
Kimi K2

Moonshot AI · 2025-07-11 · Chat model

Kimi K2.58 wins(89%)(11%)1 winKimi K2

Benchmark scores

Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.

General Knowledge

Kimi K2.5 3/4
BenchmarkKimi K2.5Kimi K2Diff
ARC-AGI65.3031 / 65Thinking (No Tools)13.3057 / 65+52
HLE50.2020 / 157Thinking (With Tools)4.70154 / 157+45.50
GPQA Diamond87.6034 / 178Thinking (No Tools)75.1093 / 178+12.50
MMLU Pro78.5066 / 126Thinking (No Tools)81.1053 / 126-2.60

Math and Reasoning

Kimi K2.5 3/3
BenchmarkKimi K2.5Kimi K2Diff
AIME202596.1021 / 106Thinking (No Tools)5485 / 106+42.10
Simple Bench46.8013 / 27Thinking (No Tools)26.3024 / 27+20.50
FrontierMath - Tier 44.2040 / 80Normal (No Tools)0.0171 / 80+4.19

Coding and Software Engineer

Kimi K2.5 2/2
BenchmarkKimi K2.5Kimi K2Diff
LiveCodeBench8516 / 120Thinking (No Tools)53.7086 / 120+31.30
SWE-bench Verified76.8027 / 108Thinking (With Tools)51.8088 / 108+25

Specs

FieldKimi K2.5Kimi K2
PublisherMoonshot AIMoonshot AI
Release date2026-01-272025-07-11
Model typeMultimodal modelChat model
ArchitectureMoEMoE
Parameters1T1T
Context length256K131K
Max output16K131K

Summary

  • Kimi K2.5leads in:General Knowledge (3/4), Math and Reasoning (3/3), Coding and Software Engineer (2/2)

On average across the 9 shared benchmarks, Kimi K2.5 scores 25.61 higher.

Largest single-benchmark gap: ARC-AGI — Kimi K2.5 65.30 vs Kimi K2 13.30 (+52).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.