Kimi K2.5vsKimi K2 Thinking

Across 9 shared benchmarks, Kimi K2.5 leads overall: Kimi K2.5 wins 5, Kimi K2 Thinking wins 4, with 0 ties and an average score difference of +0.39.

Moonshot AI
Kimi K2.5

Moonshot AI · 2026-01-27 · Multimodal model

Moonshot AI
Kimi K2 Thinking

Moonshot AI · 2025-11-06 · Reasoning model

Kimi K2.55 wins(56%)(44%)4 winsKimi K2 Thinking

Benchmark scores

Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.

General Knowledge

Kimi K2 Thinking 2/3
BenchmarkKimi K2.5Kimi K2 ThinkingDiff
MMLU Pro78.5066 / 126Thinking (No Tools)84.6032 / 126-6.10
GPQA Diamond87.6034 / 178Thinking (No Tools)84.5052 / 178+3.10
HLE50.2020 / 157Thinking (With Tools)5116 / 157-0.80

Coding and Software Engineer

Kimi K2.5 2/2
BenchmarkKimi K2.5Kimi K2 ThinkingDiff
SWE-bench Verified76.8027 / 108Thinking (With Tools)71.3051 / 108+5.50
LiveCodeBench8516 / 120Thinking (No Tools)83.1022 / 120+1.90

Math and Reasoning

Even 2/2
BenchmarkKimi K2.5Kimi K2 ThinkingDiff
FrontierMath - Tier 44.2040 / 80Normal (No Tools)072 / 80Thinking (No Tools)+4.20
AIME202596.1021 / 106Thinking (No Tools)1001 / 106-3.90

AI Agent - Information Search

Kimi K2.5 1/1
BenchmarkKimi K2.5Kimi K2 ThinkingDiff
BrowseComp60.6029 / 45Thinking (With Tools + Internet)60.2030 / 45+0.40

Claw-style Agent Evaluation

Kimi K2 Thinking 1/1
BenchmarkKimi K2.5Kimi K2 ThinkingDiff
Claw Bench81.7018 / 29Thinking (With Tools)82.5017 / 29Thinking (With Tools)-0.80

Specs

FieldKimi K2.5Kimi K2 Thinking
PublisherMoonshot AIMoonshot AI
Release date2026-01-272025-11-06
Model typeMultimodal modelReasoning model
ArchitectureMoEMoE
Parameters1T1T
Context length256K256K
Max output16KNot available

Summary

  • Kimi K2.5leads in:Coding and Software Engineer (2/2), AI Agent - Information Search (1/1)
  • Kimi K2 Thinkingleads in:General Knowledge (2/3), Claw-style Agent Evaluation (1/1)
  • Tied in:Math and Reasoning

On average across the 9 shared benchmarks, Kimi K2.5 scores 0.39 higher.

Largest single-benchmark gap: MMLU Pro — Kimi K2.5 78.50 vs Kimi K2 Thinking 84.60 (-6.10).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.