加载中...
加载中...
See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 4 个模型的评测数据与核心参数。
Compare benchmark results across thinking modes and tool usage.
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Performance benchmarks
Compare benchmark results across thinking modes and tool usage.
Best Overall
Qwen3-235B-A22B-Thinking · 68.92
Best Single
Qwen3-235B-A22B-Thinking · AIME2025 92.30
Thinking Mode (Default)
MiniMax M2 · 1 Modality support
Higher is usually better; “—” means no score.
Complete scores for each model/mode across selected benchmarks.
Higher is usually better; “—” means no score.
| Benchmark | MI MiniMax M2MiniMaxAI | QW Qwen3-235B-A22B-Thinking阿里巴巴 | DE DeepSeek V3.2-ExpDeepSeek-AI | KI Kimi K2 0905Moonshot AI |
|---|---|---|---|---|
| 综合评估 | ||||
GPQA Diamond | 78.00 | 81.10 | 79.90 | — |
HLE | 12.50 | 18.20 | 20.30 | 21.70 |
LiveBench | 64.26 | 63.42 | 71.64 | — |
MMLU Pro | 82.00 | 84.40 | 85.00 | — |
| 编程与软件工程 | ||||
LiveCodeBench | 83.00 | 74.10 | 74.10 | — |
SWE-bench Verified | 69.40 | — | 67.80 | 69.20 |
| 数学推理 | ||||
AIME2025 | 78.00 | 92.30 | 89.30 | 75.20 |
| AI Agent - 工具使用 | ||||
Terminal-Bench | 24.00 | — | 37.70 | 44.50 |
| Agent能力评测 | ||||
τ²-Bench | 77.20 | — | 66.70 | — |
τ²-Bench - Telecom | 87.00 | — | 34.00 | — |
| 指令跟随 | ||||
IF Bench | 72.30 | — | 54.10 | — |
| AI Agent - 信息收集 | ||||
BrowseComp | 44.00 | — | 40.10 | — |
Feature compare
Licensing, MoE architecture, and multi-modality support.
| Features & specs | MI MiniMax M2MiniMaxAI | QW Qwen3-235B-A22B-Thinking阿里巴巴 | DE DeepSeek V3.2-ExpDeepSeek-AI | KI Kimi K2 0905Moonshot AI |
|---|---|---|---|---|
Model snapshots | ||||
Organization | MiniMaxAI | 阿里巴巴 | DeepSeek-AI | Moonshot AI |
模型全名 | MiniMax-M2 | Qwen3-235B-A22B-Thinking-2507 | DeepSeek-V3.2-Exp | Kimi K2-Instruct-0905 |
模型简介 | Not provided | Not provided | Not provided | Not provided |
模型类型 | 聊天大模型 | 推理大模型 | 推理大模型 | 聊天大模型 |
模型代号 | minimax-m2 | Qwen3-235B-A22B-Thinking-2507 | deepseek-v3-2-exp | kimi-k2-0905 |
Release | 2025-10-27 | 2025-07-30 | 2025-09-29 | 2025-09-05 |
MoE | Yes | Yes | Yes | Yes |
规格与性能 | ||||
Context length | 205K | 256K | 128K | 256K |
Parameters | 2300 | 305 | 6710 | 10000 |
激活参数量 | 100 | 33 | 370 | 320 |
模型规模 | 100b | 34b | 100b | 100b |
模型大小 | 239.99 GB | 31.17GB | 1342GB | 1.01TB |
推理速度 | ||||
推理等级 | ||||
最大输出 | Not provided | 16384 | 64000 | 4096 |
Supported modes | 常规模式(Non-Thinking Mode)思考模式(Thinking Mode) | 思考模式(Thinking Mode) | 常规模式(Non-Thinking Mode)思考模式(Thinking Mode) | 常规模式(Non-Thinking Mode) |
开源与许可 | ||||
Code Open Source | Closed Source | Not provided | Closed Source | Closed Source |
Weights Open Source | Closed Source | Not provided | Closed Source | Closed Source |
Commercial use | 免费商用授权 | 免费商用授权 | 免费商用授权 | 免费商用授权 |
Modality support | ||||
Text Input/Output | / | / | / | / |
Image Input/Output | / | / | / | / |
Audio Input/Output | / | / | / | / |
Video Input/Output | / | / | / | / |
Embedding Input/Output | / | / | / | / |
API 接口详情 | ||||
Text 价格 | Input: 0.3 美元/100万tokensOutput: 1.2 美元/100万tokens | Input: 0.2 美元/100 万tokensOutput: 2.4 美元/100 万tokens | Input: 0.28 美元 / 100万 tokensOutput: 0.42 美元 / 100万 tokensCache: 0.028 美元 / 100万 tokens | Input: 0.60 美元/ 100 万tokensOutput: 2.5 美元/ 100 万tokens |
Image API pricing | Not provided | Not provided | Not provided | Not provided |
Audio API pricing | Not provided | Not provided | Not provided | Not provided |
Video API pricing | Not provided | Not provided | Not provided | Not provided |
Embedding API pricing | Not provided | Not provided | Not provided | Not provided |
Resources | ||||
GitHub | Repo | Repo | Repo | Not provided |
Hugging Face | Model Page | Model Page | Model Page | Model Page |
Official Page | Not provided | Not provided | Not provided | Not provided |
Guides | Not provided | Not provided | Not provided | Not provided |
Papers | Qwen3: Think Deeper, Act Faster | DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention | ||
DataLearnerAI | MiniMaxAI开源MiniMax M2模型:Artificial Analysis评测显示综合智能得分超过Claude Opus 4.1,开源第一,全球第五。 | Not provided | Not provided | Moonshot AI发布Kimi K2-Instruct-0905:256K上下文长度加持,全面升级的开放式智能体模型 |
API pricing
Side-by-side input/output token pricing