加载中...
加载中...
对比大模型在 MMLU Pro、HLE、SWE-Bench 等评测上的表现,选择评测查看排名。
各个评测基准的详细介绍可见:LLM 评测基准列表与介绍
评测切换
在这里切换评测,图表和表格会同步更新
还有更多评测基准
进入评测基准列表,按类别/语言快速筛选
| 0.00 |
| 0.00 |
| 71.00 |
| 3 | GPT-4.5 | 86.10 | 71.40 | 38.00 | 90.70 | 36.70 | 46.40 |
| 4 | DeepSeek-V3.1 | 85.00 | 80.10 | 66.00 | 0.00 | 93.10 | 74.80 |
| 5 | DeepSeek-V3.1 Terminus | 85.00 | 80.70 | 68.40 | 0.00 | 0.00 | 80.00 |
| 6 | GLM-4.7 | 84.30 | 85.70 | 73.80 | 0.00 | 0.00 | 84.90 |
| 7 | Qwen3 Max (Preview) | 84.00 | 76.00 | 69.60 | 0.00 | 0.00 | 57.50 |
| 8 | Qwen3-235B-A22B-2507 | 83.00 | 77.50 | 0.00 | 0.00 | 0.00 | 51.80 |
| 9 | GLM-4.6 | 83.00 | 82.90 | 68.00 | 0.00 | 0.00 | 84.50 |
| 10 | Pangu Pro MoE | 82.60 | 73.70 | 0.00 | 96.80 | 79.20 | 59.60 |
| 11 | MiniMax M2 | 82.00 | 78.00 | 69.40 | 0.00 | 0.00 | 83.00 |
| 12 | DeepSeek-V3-0324 | 81.20 | 68.40 | 38.80 | 94.00 | 59.40 | 49.20 |
| 13 | Kimi K2 | 81.10 | 75.10 | 51.80 | 97.40 | 69.60 | 53.70 |
| 14 | GPT-4.1 | 80.50 | 66.30 | 54.60 | 92.80 | 48.10 | 40.50 |
| 15 | GPT-4o(2025-03-27) | 79.80 | 66.90 | 0.00 | 0.00 | 0.00 | 35.80 |
| 16 | Gemini 2.0 Pro Experimental | 79.10 | 64.70 | 0.00 | 0.00 | 36.00 | 0.00 |
| 17 | Pangu Embedded | 79.00 | 0.00 | 0.00 | 92.40 | 81.90 | 67.10 |
| 18 | ERNIE-4.5-300B-A47B | 78.40 | 0.00 | 0.00 | 96.40 | 54.80 | 38.80 |
| 19 | Qwen3-30B-A3B-2507 | 78.40 | 70.40 | 22.00 | 0.00 | 0.00 | 43.20 |
| 20 | Claude 3.5 Sonnet New | 78.00 | 65.00 | 49.00 | 78.00 | 16.00 | 38.70 |
| 21 | GPT-4o(2024-11-20) | 77.90 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 22 | Qwen2.5-Max | 76.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 23 | DeepSeek-V3 | 75.90 | 59.10 | 0.00 | 87.80 | 39.00 | 34.60 |
| 24 | Grok 2 | 75.50 | 56.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 25 | GLM-4-9B-Chat | 72.40 | 0.00 | 0.00 | 0.00 | 76.40 | 51.80 |
| 26 | Gemini 2.0 Flash-Lite | 71.60 | 51.50 | 0.00 | 0.00 | 0.00 | 28.90 |
| 27 | Mistral-Small-3.2 | 69.06 | 46.13 | 0.00 | 0.00 | 0.00 | 0.00 |
| 28 | Llama3.3-70B-Instruct | 68.90 | 50.50 | 0.00 | 0.00 | 0.00 | 33.30 |
| 29 | Gemma 3 - 27B (IT) | 67.50 | 42.40 | 0.00 | 0.00 | 25.30 | 29.70 |
| 30 | Qwen3-Next | 66.05 | 0.00 | 0.00 | 0.00 | 0.00 | 56.60 |
| 31 | Mixtral-8x22B-Instruct-v0.1 | 56.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 32 | Llama3-70B-Instruct | 56.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 33 | Phi-4-mini-instruct (3.8B) | 52.80 | 36.00 | 0.00 | 71.80 | 10.00 | 0.00 |
| 34 | Llama3-70B | 52.78 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 35 | Grok-1.5 | 51.00 | 35.90 | 0.00 | 0.00 | 0.00 | 0.00 |
| 36 | Llama3.1-8B-Instruct | 44.00 | 26.30 | 0.00 | 0.00 | 0.00 | 0.00 |
| 37 | Moonlight-16B-A3B-Instruct | 42.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 38 | Mistral-7B-Instruct-v0.3 | 30.90 | 24.70 | 0.00 | 0.00 | 0.00 | 0.00 |
| 39 | Gemini 2.5 Deep Think | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 87.60 |
| 40 | Gemini 2.5 Flash-Preview-09-2025 | 0.00 | 0.00 | 54.00 | 0.00 | 0.00 | 0.00 |
| 41 | Kimi K2 0905 | 0.00 | 0.00 | 69.20 | 0.00 | 0.00 | 0.00 |
| 42 | Step 3.5 Flash | 0.00 | 0.00 | 74.40 | 0.00 | 0.00 | 86.40 |
| 43 | GPT-4.1 nano | 0.00 | 50.30 | 0.00 | 0.00 | 29.40 | 0.00 |
| 44 | Hunyuan-7B | 0.00 | 60.10 | 0.00 | 93.70 | 81.10 | 57.00 |
| 45 | Qwen3-4B-2507 | 0.00 | 62.00 | 0.00 | 0.00 | 0.00 | 35.10 |
| 46 | GPT-4.1 mini | 0.00 | 65.00 | 23.60 | 0.00 | 49.60 | 0.00 |
| 47 | Qwen3-4B-Thinking-2507 | 0.00 | 65.80 | 0.00 | 0.00 | 0.00 | 55.20 |
| 48 | Claude Sonnet 3.7 | 0.00 | 68.00 | 70.30 | 82.20 | 23.30 | 0.00 |
| 49 | Grok 3 | 0.00 | 80.40 | 0.00 | 0.00 | 84.20 | 70.60 |
| 50 | Grok 4 Fast | 0.00 | 85.70 | 0.00 | 0.00 | 0.00 | 80.00 |
| 51 | Grok 4 Heavy | 0.00 | 88.90 | 73.50 | 0.00 | 0.00 | 0.00 |
| 52 | Gemini 3.0 Flash | 0.00 | 90.40 | 68.70 | 0.00 | 0.00 | 0.00 |
| 53 | GPT-5.2 | 0.00 | 92.40 | 80.00 | 0.00 | 0.00 | 0.00 |