MiniMax-M1与其它模型在不同评测上的对比结果2025/06/17 13:52:07276 阅读 CategoryTaskMiniMax-M1-80KMiniMax-M1-40KQwen3-235B-A22BDeepSeek-R1-0528DeepSeek-R1Seed-Thinking-v1.5Claude 4 OpusGemini 2.5 Pro (06-05)OpenAI-o3Extended Thinking80K40K32k64k32k32k64k64k100kMathematicsAIME 202486.083.385.791.479.886.776.092.091.6AIME 202576.974.681.587.570.074.075.588.088.9MATH-50096.896.096.298.097.396.798.298.898.1General CodingLiveCodeBench (24/8~25/5)65.062.365.973.155.967.556.677.175.8FullStackBench68.367.662.969.470.169.970.3--69.3Reasoning & KnowledgeGPQA Diamond70.069.271.181.071.577.379.686.483.3HLE (no tools)8.4*7.2*7.6*17.7*8.6*8.210.721.620.3ZebraLogic86.880.180.395.178.784.495.191.695.8MMLU-Pro81.180.683.085.084.087.085.086.085.0Software EngineeringSWE-bench Verified56.055.634.457.649.247.072.567.269.1Long ContextOpenAI-MRCR (128k)73.476.127.751.535.854.348.976.856.5OpenAI-MRCR (1M)56.258.6----------58.8--LongBench-v261.561.050.152.158.352.555.665.058.8Agentic Tool UseTAU-bench (airline)62.060.034.753.5--44.059.650.052.0TAU-bench (retail)63.567.858.663.9--55.781.467.073.9FactualitySimpleQA18.517.911.027.830.112.9--54.049.4General AssistantMultiChallenge44.744.740.045.040.743.045.851.856.5