MiniMax M2.5vsKimi K2.5

Across 13 shared benchmarks, MiniMax M2.5 leads overall: MiniMax M2.5 wins 7, Kimi K2.5 wins 6, with 0 ties and an average score difference of -0.99.

MiniMaxAI
MiniMax M2.5

MiniMaxAI · 2026-02-12 · Reasoning model

Moonshot AI
Kimi K2.5

Moonshot AI · 2026-01-27 · Multimodal model

MiniMax M2.57 wins(54%)(46%)6 winsKimi K2.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 13 shared benchmarks.

General Knowledge

Kimi K2.5 4/4
BenchmarkMiniMax M2.5Kimi K2.5Diff
HLE19.40106 / 157Thinking (No Tools)50.2020 / 157Thinking (With Tools)-30.80
ARC-AGI-24.9044 / 59Thinking (No Tools)11.8036 / 59Thinking (No Tools)-6.90
GPQA Diamond85.2048 / 178Thinking (No Tools)87.6034 / 178Thinking (No Tools)-2.40
ARC-AGI63.7032 / 65Thinking (No Tools)65.3031 / 65Thinking (No Tools)-1.60

Claw-style Agent Evaluation

MiniMax M2.5 2/2
BenchmarkMiniMax M2.5Kimi K2.5Diff
Claw Bench92.104 / 29Thinking (With Tools)81.7018 / 29Thinking (With Tools)+10.40
Pinch Bench87.806 / 37Thinking (With Tools)84.8017 / 37Thinking (With Tools)+3

Coding and Software Engineer

MiniMax M2.5 2/2
BenchmarkMiniMax M2.5Kimi K2.5Diff
SWE-Bench Pro - Public55.4018 / 4350.7031 / 43Thinking (With Tools)+4.70
SWE-bench Verified80.2013 / 10876.8027 / 108Thinking (With Tools)+3.40

AI Agent - Information Search

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5Kimi K2.5Diff
BrowseComp76.3018 / 4560.6029 / 45Thinking (With Tools + Internet)+15.70

AI Agent - Tool Usage

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5Kimi K2.5Diff
Terminal Bench 2.051.7030 / 4650.8033 / 46Thinking (With Tools)+0.90

Long Context

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5Kimi K2.5Diff
AA-LCR69.503 / 13Thinking (No Tools)6510 / 13Thinking (No Tools)+4.50

Math and Reasoning

Kimi K2.5 1/1
BenchmarkMiniMax M2.5Kimi K2.5Diff
AIME202586.3048 / 106Thinking (No Tools)96.1021 / 106Thinking (No Tools)-9.80

Productivity Knowledge

Kimi K2.5 1/1
BenchmarkMiniMax M2.5Kimi K2.5Diff
GDPval-AA3617 / 21Thinking (No Tools)4015 / 21Thinking (No Tools)-4

Specs

FieldMiniMax M2.5Kimi K2.5
PublisherMiniMaxAIMoonshot AI
Release date2026-02-122026-01-27
Model typeReasoning modelMultimodal model
ArchitectureMoEMoE
Parameters229B1T
Context length128K256K
Max outputNot available16K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemMiniMax M2.5Kimi K2.5
Text input$0.3 / 1M tokensNot public
Text output$2.4 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • MiniMax M2.5leads in:Claw-style Agent Evaluation (2/2), Coding and Software Engineer (2/2), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Long Context (1/1)
  • Kimi K2.5leads in:General Knowledge (4/4), Math and Reasoning (1/1), Productivity Knowledge (1/1)

On average across the 13 shared benchmarks, Kimi K2.5 scores 0.99 higher.

Largest single-benchmark gap: HLE — MiniMax M2.5 19.40 vs Kimi K2.5 50.20 (-30.80).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.