MiniMax M2.5vsM2.1

Across 10 shared benchmarks, MiniMax M2.5 leads overall: MiniMax M2.5 wins 8, M2.1 wins 1, with 1 ties and an average score difference of +8.21.

MiniMaxAI · 2026-02-12 · Reasoning model

MiniMaxAI · 2025-12-23 · Chat model

MiniMax M2.58 wins(80%)Ties1(10%)1 winM2.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

MiniMax M2.5 2/2

Benchmark	MiniMax M2.5	M2.1	Diff
SWE-Bench Pro - Public	55.4018 / 43	32.6042 / 43	+22.80
SWE-bench Verified	80.2013 / 108	74.8035 / 108	+5.40

Even 2/2

Benchmark	MiniMax M2.5	M2.1	Diff
GPQA Diamond	85.2048 / 178Thinking (No Tools)	8169 / 178	+4.20
HLE	19.40106 / 157Thinking (No Tools)	2294 / 157	-2.60

MiniMax M2.5 1/1

Benchmark	MiniMax M2.5	M2.1	Diff
τ²-Bench - Telecom	97.8010 / 35	8722 / 35	+10.80

MiniMax M2.5 1/1

Benchmark	MiniMax M2.5	M2.1	Diff
BrowseComp	76.3018 / 45	47.4037 / 45	+28.90

MiniMax M2.5 1/1

Benchmark	MiniMax M2.5	M2.1	Diff
Terminal Bench 2.0	51.7030 / 46	47.9035 / 46	+3.80

MiniMax M2.5 1/1

Benchmark	MiniMax M2.5	M2.1	Diff
Pinch Bench	87.806 / 37Thinking (With Tools)	84.3018 / 37Thinking (With Tools)	+3.50

Even 1/1

Benchmark	MiniMax M2.5	M2.1	Diff
IF Bench	7012 / 29	7012 / 29	—

MiniMax M2.5 1/1

Benchmark	MiniMax M2.5	M2.1	Diff
AIME2025	86.3048 / 106Thinking (No Tools)	8156 / 106	+5.30

Prices use DataLearner records when available; missing fields are not inferred.

Item	MiniMax M2.5	M2.1
Text input	$0.3 / 1M tokens	Not public
Text output	$2.4 / 1M tokens	Not public

One or both models have incomplete public pricing.

MiniMax M2.5leads in:Coding and Software Engineer (2/2), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Math and Reasoning (1/1)
Tied in:General Knowledge, Instruction Following

On average across the 10 shared benchmarks, MiniMax M2.5 scores 8.21 higher.

Largest single-benchmark gap: BrowseComp — MiniMax M2.5 76.30 vs M2.1 47.40 (+28.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.