MiniMax-M2.7vsM2.1

Across 6 shared benchmarks, MiniMax-M2.7 leads overall: MiniMax-M2.7 wins 5, M2.1 wins 1, with 0 ties and an average score difference of +7.07.

MiniMaxAI · 2026-03-18 · Reasoning model

MiniMaxAI · 2025-12-23 · Chat model

MiniMax-M2.75 wins(83%)(17%)1 winM2.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

MiniMax-M2.7 2/2

Benchmark	MiniMax-M2.7	M2.1	Diff
GPQA Diamond	8738 / 178Thinking (No Tools)	8169 / 178	+6
HLE	2882 / 157Thinking (No Tools)	2294 / 157	+6

M2.1 1/1

Benchmark	MiniMax-M2.7	M2.1	Diff
τ²-Bench - Telecom	8524 / 35Thinking (With Tools)	8722 / 35	-2

MiniMax-M2.7 1/1

Benchmark	MiniMax-M2.7	M2.1	Diff
Pinch Bench	87.109 / 37Thinking (With Tools)	84.3018 / 37Thinking (With Tools)	+2.80

MiniMax-M2.7 1/1

Benchmark	MiniMax-M2.7	M2.1	Diff
SWE-Bench Pro - Public	56.2016 / 43Thinking (With Tools)	32.6042 / 43	+23.60

MiniMax-M2.7 1/1

Benchmark	MiniMax-M2.7	M2.1	Diff
IF Bench	765 / 29Thinking (With Tools)	7012 / 29	+6

Prices use DataLearner records when available; missing fields are not inferred.

One or both models have incomplete public pricing.

MiniMax-M2.7leads in:General Knowledge (2/2), Claw-style Agent Evaluation (1/1), Coding and Software Engineer (1/1), Instruction Following (1/1)
M2.1leads in:Agent Level Benchmark (1/1)

On average across the 6 shared benchmarks, MiniMax-M2.7 scores 7.07 higher.

Largest single-benchmark gap: SWE-Bench Pro - Public — MiniMax-M2.7 56.20 vs M2.1 32.60 (+23.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.