加载中...

Comparing MiniMax M2, Qwen3-235B-A22B-Thinking, DeepSeek V3.2-Exp (+1 more) - LLM benchmark results | DataLearnerAI

大模型评测对比结果

See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 4 个模型的评测数据与核心参数。

MiniMax M2Qwen3-235B-A22B-ThinkingDeepSeek V3.2-ExpKimi K2 0905

规格对比

MiniMaxAI

MiniMax M2

MiniMax-M2

Release2025-10-27

Context length205K

Parameters2300

常规模式（Non-Thinking Mode）思考模式（Thinking Mode）

Model profile Playground

阿里巴巴

Qwen3-235B-A22B-Thinking

Qwen3-235B-A22B-Thinking-2507

Release2025-07-30

Context length256K

Parameters305

思考模式（Thinking Mode）

Model profile Playground

DeepSeek-AI

DeepSeek V3.2-Exp

DeepSeek-V3.2-Exp

Release2025-09-29

Context length128K

Parameters6710

常规模式（Non-Thinking Mode）思考模式（Thinking Mode）

Model profile Playground

Moonshot AI

Kimi K2 0905

Kimi K2-Instruct-0905

Release2025-09-05

Context length256K

Parameters10000

常规模式（Non-Thinking Mode）

Model profile Playground

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

All Modes · Exclude Parallel

View

Thinking Mode (Default)

Thinking Mode (Default) - Help

Default: Thinking Mode (Default) (Standard/Medium)
All: Thinking Mode (All)

All Tools

Parallel

Best Overall

Qwen3-235B-A22B-Thinking · 68.92

Best Single

Qwen3-235B-A22B-Thinking · AIME2025 92.30

Thinking Mode (Default)

MiniMax M2 · 1 Modality support

Benchmark scores

Higher is usually better; “—” means no score.

Filter: All Modes · Exclude Parallel4 All Modes · 12 Benchmark

图表加载中...

Benchmark score table

Complete scores for each model/mode across selected benchmarks.

Benchmark scores

Higher is usually better; “—” means no score.

12 Benchmark4 All Modes

Supported modes:NormalThinkDeepToolParallel

Benchmark	MI MiniMax M2MiniMaxAI	QW Qwen3-235B-A22B-Thinking阿里巴巴	DE DeepSeek V3.2-ExpDeepSeek-AI	KI Kimi K2 0905Moonshot AI
Benchmark
综合评估
GPQA Diamond	78.00	81.10	79.90	—
HLE	12.50	18.20	20.30	21.70
LiveBench	64.26	63.42	71.64	—
MMLU Pro	82.00	84.40	85.00	—
编程与软件工程
LiveCodeBench	83.00	74.10	74.10	—
SWE-bench Verified	69.40	—	67.80	69.20
数学推理
AIME2025	78.00	92.30	89.30	75.20
AI Agent - 工具使用
Terminal-Bench	24.00	—	37.70	44.50
Agent能力评测
τ²-Bench	77.20	—	66.70	—
τ²-Bench - Telecom	87.00	—	34.00	—
指令跟随
IF Bench	72.30	—	54.10	—
AI Agent - 信息收集
BrowseComp	44.00	—	40.10	—

Feature compare

Detailed feature breakdown

Licensing, MoE architecture, and multi-modality support.

Features & specs	MI MiniMax M2MiniMaxAI	QW Qwen3-235B-A22B-Thinking阿里巴巴	DE DeepSeek V3.2-ExpDeepSeek-AI	KI Kimi K2 0905Moonshot AI
Model snapshots
Organization	MiniMaxAI	阿里巴巴	DeepSeek-AI	Moonshot AI
模型全名	MiniMax-M2	Qwen3-235B-A22B-Thinking-2507	DeepSeek-V3.2-Exp	Kimi K2-Instruct-0905
模型简介	Not provided	Not provided	Not provided	Not provided
模型类型	聊天大模型	推理大模型	推理大模型	聊天大模型
模型代号	minimax-m2	Qwen3-235B-A22B-Thinking-2507	deepseek-v3-2-exp	kimi-k2-0905
Release	2025-10-27	2025-07-30	2025-09-29	2025-09-05
MoE	Yes	Yes	Yes	Yes
规格与性能
Context length	205K	256K	128K	256K
Parameters	2300	305	6710	10000
激活参数量	100	33	370	320
模型规模	100b	34b	100b	100b
模型大小	239.99 GB	31.17GB	1342GB	1.01TB
推理速度
推理等级
最大输出	Not provided	16384	64000	4096
Supported modes	常规模式（Non-Thinking Mode）思考模式（Thinking Mode）	思考模式（Thinking Mode）	常规模式（Non-Thinking Mode）思考模式（Thinking Mode）	常规模式（Non-Thinking Mode）
开源与许可
Code Open Source	Closed Source	Not provided	Closed Source	Closed Source
Weights Open Source	Closed Source	Not provided	Closed Source	Closed Source
Commercial use	免费商用授权	免费商用授权	免费商用授权	免费商用授权
Modality support
Text Input/Output	/	/	/	/
Image Input/Output	/	/	/	/
Audio Input/Output	/	/	/	/
Video Input/Output	/	/	/	/
Embedding Input/Output	/	/	/	/
API 接口详情
Text 价格	Input: 0.3 美元/100万tokensOutput: 1.2 美元/100万tokens	Input: 0.2 美元/100 万tokensOutput: 2.4 美元/100 万tokens	Input: 0.28 美元 / 100万 tokensOutput: 0.42 美元 / 100万 tokensCache: 0.028 美元 / 100万 tokens	Input: 0.60 美元/ 100 万tokensOutput: 2.5 美元/ 100 万tokens
Image API pricing	Not provided	Not provided	Not provided	Not provided
Audio API pricing	Not provided	Not provided	Not provided	Not provided
Video API pricing	Not provided	Not provided	Not provided	Not provided
Embedding API pricing	Not provided	Not provided	Not provided	Not provided
Resources
GitHub	Repo	Repo	Repo	Not provided
Hugging Face	Model Page	Model Page	Model Page	Model Page
Official Page	Not provided	Not provided	Not provided	Not provided
Guides	Not provided	Not provided	Not provided	Not provided
Papers		Qwen3: Think Deeper, Act Faster	DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention
DataLearnerAI	MiniMaxAI开源MiniMax M2模型：Artificial Analysis评测显示综合智能得分超过Claude Opus 4.1，开源第一，全球第五。	Not provided	Not provided	Moonshot AI发布Kimi K2-Instruct-0905：256K上下文长度加持，全面升级的开放式智能体模型

API pricing

API price comparison

Side-by-side input/output token pricing

Loading comparison...

大模型评测对比结果

See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 4 个模型的评测数据与核心参数。

MiniMax M2Qwen3-235B-A22B-ThinkingDeepSeek V3.2-ExpKimi K2 0905

规格对比

MiniMaxAI

MiniMax M2

MiniMax-M2

Release2025-10-27

Context length205K

Parameters2300

常规模式（Non-Thinking Mode）思考模式（Thinking Mode）

Model profile Playground

阿里巴巴

Qwen3-235B-A22B-Thinking

Qwen3-235B-A22B-Thinking-2507

Release2025-07-30

Context length256K

Parameters305

思考模式（Thinking Mode）

Model profile Playground

DeepSeek-AI

DeepSeek V3.2-Exp

DeepSeek-V3.2-Exp

Release2025-09-29

Context length128K

Parameters6710

常规模式（Non-Thinking Mode）思考模式（Thinking Mode）

Model profile Playground

Moonshot AI

Kimi K2 0905

Kimi K2-Instruct-0905

Release2025-09-05

Context length256K

Parameters10000

常规模式（Non-Thinking Mode）

Model profile Playground

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

All Modes · Exclude Parallel

View

Thinking Mode (Default)

Thinking Mode (Default) - Help

Default: Thinking Mode (Default) (Standard/Medium)
All: Thinking Mode (All)

All Tools

Parallel

Best Overall

Qwen3-235B-A22B-Thinking · 68.92

Best Single

Qwen3-235B-A22B-Thinking · AIME2025 92.30

Thinking Mode (Default)

MiniMax M2 · 1 Modality support

Benchmark scores

Higher is usually better; “—” means no score.

Filter: All Modes · Exclude Parallel4 All Modes · 12 Benchmark

图表加载中...

Benchmark score table

Complete scores for each model/mode across selected benchmarks.

Benchmark scores

Higher is usually better; “—” means no score.

12 Benchmark4 All Modes

Supported modes:NormalThinkDeepToolParallel

Benchmark	MI MiniMax M2MiniMaxAI	QW Qwen3-235B-A22B-Thinking阿里巴巴	DE DeepSeek V3.2-ExpDeepSeek-AI	KI Kimi K2 0905Moonshot AI
Benchmark
综合评估
GPQA Diamond	78.00	81.10	79.90	—
HLE	12.50	18.20	20.30	21.70
LiveBench	64.26	63.42	71.64	—
MMLU Pro	82.00	84.40	85.00	—
编程与软件工程
LiveCodeBench	83.00	74.10	74.10	—
SWE-bench Verified	69.40	—	67.80	69.20
数学推理
AIME2025	78.00	92.30	89.30	75.20
AI Agent - 工具使用
Terminal-Bench	24.00	—	37.70	44.50
Agent能力评测
τ²-Bench	77.20	—	66.70	—
τ²-Bench - Telecom	87.00	—	34.00	—
指令跟随
IF Bench	72.30	—	54.10	—
AI Agent - 信息收集
BrowseComp	44.00	—	40.10	—

Feature compare

Detailed feature breakdown

Licensing, MoE architecture, and multi-modality support.

Features & specs	MI MiniMax M2MiniMaxAI	QW Qwen3-235B-A22B-Thinking阿里巴巴	DE DeepSeek V3.2-ExpDeepSeek-AI	KI Kimi K2 0905Moonshot AI
Model snapshots
Organization	MiniMaxAI	阿里巴巴	DeepSeek-AI	Moonshot AI
模型全名	MiniMax-M2	Qwen3-235B-A22B-Thinking-2507	DeepSeek-V3.2-Exp	Kimi K2-Instruct-0905
模型简介	Not provided	Not provided	Not provided	Not provided
模型类型	聊天大模型	推理大模型	推理大模型	聊天大模型
模型代号	minimax-m2	Qwen3-235B-A22B-Thinking-2507	deepseek-v3-2-exp	kimi-k2-0905
Release	2025-10-27	2025-07-30	2025-09-29	2025-09-05
MoE	Yes	Yes	Yes	Yes
规格与性能
Context length	205K	256K	128K	256K
Parameters	2300	305	6710	10000
激活参数量	100	33	370	320
模型规模	100b	34b	100b	100b
模型大小	239.99 GB	31.17GB	1342GB	1.01TB
推理速度
推理等级
最大输出	Not provided	16384	64000	4096
Supported modes	常规模式（Non-Thinking Mode）思考模式（Thinking Mode）	思考模式（Thinking Mode）	常规模式（Non-Thinking Mode）思考模式（Thinking Mode）	常规模式（Non-Thinking Mode）
开源与许可
Code Open Source	Closed Source	Not provided	Closed Source	Closed Source
Weights Open Source	Closed Source	Not provided	Closed Source	Closed Source
Commercial use	免费商用授权	免费商用授权	免费商用授权	免费商用授权
Modality support
Text Input/Output	/	/	/	/
Image Input/Output	/	/	/	/
Audio Input/Output	/	/	/	/
Video Input/Output	/	/	/	/
Embedding Input/Output	/	/	/	/
API 接口详情
Text 价格	Input: 0.3 美元/100万tokensOutput: 1.2 美元/100万tokens	Input: 0.2 美元/100 万tokensOutput: 2.4 美元/100 万tokens	Input: 0.28 美元 / 100万 tokensOutput: 0.42 美元 / 100万 tokensCache: 0.028 美元 / 100万 tokens	Input: 0.60 美元/ 100 万tokensOutput: 2.5 美元/ 100 万tokens
Image API pricing	Not provided	Not provided	Not provided	Not provided
Audio API pricing	Not provided	Not provided	Not provided	Not provided
Video API pricing	Not provided	Not provided	Not provided	Not provided
Embedding API pricing	Not provided	Not provided	Not provided	Not provided
Resources
GitHub	Repo	Repo	Repo	Not provided
Hugging Face	Model Page	Model Page	Model Page	Model Page
Official Page	Not provided	Not provided	Not provided	Not provided
Guides	Not provided	Not provided	Not provided	Not provided
Papers		Qwen3: Think Deeper, Act Faster	DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention
DataLearnerAI	MiniMaxAI开源MiniMax M2模型：Artificial Analysis评测显示综合智能得分超过Claude Opus 4.1，开源第一，全球第五。	Not provided	Not provided	Moonshot AI发布Kimi K2-Instruct-0905：256K上下文长度加持，全面升级的开放式智能体模型

API pricing

API price comparison

Side-by-side input/output token pricing

MiniMax M2

Qwen3-235B-A22B-Thinking

DeepSeek V3.2-Exp

Kimi K2 0905

Performance benchmarks

Benchmark scores

Benchmark score table

Benchmark scores

Detailed feature breakdown

Model snapshots

规格与性能

开源与许可

Modality support

API 接口详情

Resources

API price comparison

MiniMax M2

Qwen3-235B-A22B-Thinking

DeepSeek V3.2-Exp

Kimi K2 0905

Performance benchmarks

Benchmark scores

Benchmark score table

Benchmark scores

Detailed feature breakdown

Model snapshots

规格与性能

开源与许可

Modality support

API 接口详情

Resources

API price comparison