MiniMax M2.5vsMiniMax M2

Across 7 shared benchmarks, MiniMax M2.5 leads overall: MiniMax M2.5 wins 6, MiniMax M2 wins 1, with 0 ties and an average score difference of +10.57.

MiniMaxAI
MiniMax M2.5

MiniMaxAI · 2026-02-12 · Reasoning model

MiniMaxAI
MiniMax M2

MiniMaxAI · 2025-10-27 · Chat model

MiniMax M2.56 wins(86%)(14%)1 winMiniMax M2

Benchmark scores

Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.

General Knowledge

MiniMax M2.5 2/2
BenchmarkMiniMax M2.5MiniMax M2Diff
GPQA Diamond85.2048 / 178Thinking (No Tools)7883 / 178+7.20
HLE19.40106 / 157Thinking (No Tools)12.50125 / 157+6.90

Agent Level Benchmark

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5MiniMax M2Diff
τ²-Bench - Telecom97.8010 / 358722 / 35+10.80

AI Agent - Information Search

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5MiniMax M2Diff
BrowseComp76.3018 / 454439 / 45+32.30

Coding and Software Engineer

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5MiniMax M2Diff
SWE-bench Verified80.2013 / 10869.4058 / 108+10.80

Instruction Following

MiniMax M2 1/1
BenchmarkMiniMax M2.5MiniMax M2Diff
IF Bench7012 / 2972.309 / 29-2.30

Math and Reasoning

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5MiniMax M2Diff
AIME202586.3048 / 106Thinking (No Tools)7860 / 106+8.30

Specs

FieldMiniMax M2.5MiniMax M2
PublisherMiniMaxAIMiniMaxAI
Release date2026-02-122025-10-27
Model typeReasoning modelChat model
ArchitectureMoEMoE
Parameters229B230B
Context length128K205K
Max outputNot availableNot available

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemMiniMax M2.5MiniMax M2
Text input$0.3 / 1M tokensNot public
Text output$2.4 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • MiniMax M2.5leads in:General Knowledge (2/2), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), Coding and Software Engineer (1/1), Math and Reasoning (1/1)
  • MiniMax M2leads in:Instruction Following (1/1)

On average across the 7 shared benchmarks, MiniMax M2.5 scores 10.57 higher.

Largest single-benchmark gap: BrowseComp — MiniMax M2.5 76.30 vs MiniMax M2 44 (+32.30).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.