MiniMax M2.5vsM2.1

Across 10 shared benchmarks, MiniMax M2.5 leads overall: MiniMax M2.5 wins 8, M2.1 wins 1, with 1 ties and an average score difference of +8.21.

MiniMaxAI
MiniMax M2.5

MiniMaxAI · 2026-02-12 · Reasoning model

MiniMaxAI
M2.1

MiniMaxAI · 2025-12-23 · Chat model

MiniMax M2.58 wins(80%)Ties1(10%)1 winM2.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

Coding and Software Engineer

MiniMax M2.5 2/2
BenchmarkMiniMax M2.5M2.1Diff
SWE-Bench Pro - Public55.4018 / 4332.6042 / 43+22.80
SWE-bench Verified80.2013 / 10874.8035 / 108+5.40

General Knowledge

Even 2/2
BenchmarkMiniMax M2.5M2.1Diff
GPQA Diamond85.2048 / 178Thinking (No Tools)8169 / 178+4.20
HLE19.40106 / 157Thinking (No Tools)2294 / 157-2.60

Agent Level Benchmark

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5M2.1Diff
τ²-Bench - Telecom97.8010 / 358722 / 35+10.80

AI Agent - Information Search

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5M2.1Diff
BrowseComp76.3018 / 4547.4037 / 45+28.90

AI Agent - Tool Usage

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5M2.1Diff
Terminal Bench 2.051.7030 / 4647.9035 / 46+3.80

Claw-style Agent Evaluation

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5M2.1Diff
Pinch Bench87.806 / 37Thinking (With Tools)84.3018 / 37Thinking (With Tools)+3.50

Instruction Following

Even 1/1
BenchmarkMiniMax M2.5M2.1Diff
IF Bench7012 / 297012 / 29

Math and Reasoning

MiniMax M2.5 1/1
BenchmarkMiniMax M2.5M2.1Diff
AIME202586.3048 / 106Thinking (No Tools)8156 / 106+5.30

Specs

FieldMiniMax M2.5M2.1
PublisherMiniMaxAIMiniMaxAI
Release date2026-02-122025-12-23
Model typeReasoning modelChat model
ArchitectureMoEMoE
Parameters229B230B
Context length128K200K
Max outputNot available128K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemMiniMax M2.5M2.1
Text input$0.3 / 1M tokensNot public
Text output$2.4 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • MiniMax M2.5leads in:Coding and Software Engineer (2/2), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Math and Reasoning (1/1)
  • Tied in:General Knowledge, Instruction Following

On average across the 10 shared benchmarks, MiniMax M2.5 scores 8.21 higher.

Largest single-benchmark gap: BrowseComp — MiniMax M2.5 76.30 vs M2.1 47.40 (+28.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.