Kimi K2.5vsKimi K2 Thinking

在 9 个共同 benchmark 中,Kimi K2.5 整体领先:Kimi K2.5 领先 5 项,Kimi K2 Thinking 领先 4 项,持平 0 项,平均分差 +0.39。

Moonshot AI
Kimi K2.5

Moonshot AI · 2026-01-27 · 多模态大模型

Moonshot AI
Kimi K2 Thinking

Moonshot AI · 2025-11-06 · 推理大模型

Kimi K2.55 (56%)(44%)4 Kimi K2 Thinking

评测分数

按能力类目分组,每组内按分差大小排列;共 9 项。

General Knowledge

Kimi K2 Thinking 领先 2/3
评测项Kimi K2.5Kimi K2 Thinking分差
MMLU Pro78.5066 / 126Thinking (No Tools)84.6032 / 126-6.10
GPQA Diamond87.6034 / 178Thinking (No Tools)84.5052 / 178+3.10
HLE50.2020 / 157Thinking (With Tools)5116 / 157-0.80

Coding and Software Engineer

Kimi K2.5 领先 2/2
评测项Kimi K2.5Kimi K2 Thinking分差
SWE-bench Verified76.8027 / 108Thinking (With Tools)71.3051 / 108+5.50
LiveCodeBench8516 / 120Thinking (No Tools)83.1022 / 120+1.90

Math and Reasoning

胶着 2/2
评测项Kimi K2.5Kimi K2 Thinking分差
FrontierMath - Tier 44.2040 / 80Normal (No Tools)072 / 80Thinking (No Tools)+4.20
AIME202596.1021 / 106Thinking (No Tools)1001 / 106-3.90

AI Agent - Information Search

Kimi K2.5 领先 1/1
评测项Kimi K2.5Kimi K2 Thinking分差
BrowseComp60.6029 / 45Thinking (With Tools + Internet)60.2030 / 45+0.40

Claw-style Agent Evaluation

Kimi K2 Thinking 领先 1/1
评测项Kimi K2.5Kimi K2 Thinking分差
Claw Bench81.7018 / 29Thinking (With Tools)82.5017 / 29Thinking (With Tools)-0.80

规格对比

字段Kimi K2.5Kimi K2 Thinking
发布机构Moonshot AIMoonshot AI
发布时间2026-01-272025-11-06
模型类型多模态大模型推理大模型
架构MoE 架构MoE 架构
参数规模1万亿1万亿
上下文长度256K256K
最大输出16K暂无数据

小结

  • Kimi K2.5在以下类目领先:Coding and Software Engineer (2/2)、AI Agent - Information Search (1/1)
  • Kimi K2 Thinking在以下类目领先:General Knowledge (2/3)、Claw-style Agent Evaluation (1/1)
  • 胶着类目:Math and Reasoning

9 个共同 benchmark 上,Kimi K2.5 平均高出 0.39 分。

单项差距最大的 benchmark:MMLU Pro — Kimi K2.5 78.50,Kimi K2 Thinking 84.60(分差 -6.10)。

本页正文由结构化模型、价格与 benchmark 数据生成,不使用实时 LLM 撰写。