加载中...
加载中...
Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.
Detailed benchmark descriptions available at:LLM Benchmark List & Guide
Benchmark switcher
Pick the leaderboard to sync both chart and table
Data source: DataLearnerAI
| 0.00 |
| 0.00 |
| 3 | GPT-4.5 | 86.10 | 71.40 | 38.00 | 90.70 | 36.70 | 46.40 |
| 4 | Qwen3-Max-Thinking | 85.70 | 87.40 | 75.30 | 0.00 | 0.00 | 85.90 |
| 5 | DeepSeek-R1-0528 | 85.00 | 81.00 | 57.60 | 98.00 | 91.40 | 73.30 |
| 6 | DeepSeek-V3.1 | 85.00 | 80.10 | 66.00 | 0.00 | 93.10 | 74.80 |
| 7 | DeepSeek-V3.1 Terminus | 85.00 | 80.70 | 68.40 | 0.00 | 0.00 | 80.00 |
| 8 | DeepSeek V3.2-Exp | 85.00 | 79.90 | 67.80 | 0.00 | 0.00 | 74.10 |
| 9 | Claude Opus 4 | 85.00 | 79.60 | 72.50 | 98.20 | 76.00 | 56.60 |
| 10 | GLM-4.5 | 84.60 | 79.10 | 64.20 | 98.20 | 91.00 | 72.90 |
| 11 | Kimi K2 Thinking | 84.60 | 84.50 | 71.30 | 0.00 | 0.00 | 83.10 |
| 12 | Qwen3-235B-A22B-Thinking-2507 | 84.40 | 81.10 | 0.00 | 0.00 | 0.00 | 74.10 |
| 13 | GLM-4.7 | 84.30 | 85.70 | 73.80 | 0.00 | 0.00 | 84.90 |
| 14 | DeepSeek-R1 | 84.00 | 71.50 | 49.20 | 97.30 | 79.80 | 65.90 |
| 15 | Intern-S1 | 83.50 | 77.30 | 0.00 | 0.00 | 0.00 | 0.00 |
| 16 | Qwen3-235B-A22B-2507 | 83.00 | 77.50 | 0.00 | 0.00 | 0.00 | 51.80 |
| 17 | GLM-4.6 | 83.00 | 82.90 | 68.00 | 0.00 | 0.00 | 84.50 |
| 18 | Llama 4 Behemoth Instruct | 82.20 | 73.70 | 0.00 | 95.00 | 0.00 | 49.40 |
| 19 | MiniMax M2 | 82.00 | 78.00 | 69.40 | 0.00 | 0.00 | 83.00 |
| 20 | GLM-4.5-Air | 81.40 | 75.00 | 57.60 | 98.10 | 89.40 | 70.70 |
| 21 | DeepSeek-V3-0324 | 81.20 | 68.40 | 38.80 | 94.00 | 59.40 | 49.20 |
| 22 | Kimi K2 | 81.10 | 75.10 | 51.80 | 97.40 | 69.60 | 53.70 |
| 23 | MiniMax-M1-80k | 81.10 | 70.00 | 56.00 | 96.80 | 86.00 | 65.00 |
| 24 | OpenAI o4 - mini | 80.60 | 81.40 | 68.10 | 0.00 | 98.70 | 0.00 |
| 25 | MiniMax-M1-40k | 80.60 | 69.20 | 55.60 | 96.00 | 83.30 | 62.30 |
| 26 | Llama 4 Maverick Instruct | 80.50 | 69.80 | 0.00 | 0.00 | 0.00 | 43.40 |
| 27 | GPT-4.1 | 80.50 | 66.30 | 54.60 | 92.80 | 48.10 | 40.50 |
| 28 | OpenAI o1-mini | 80.30 | 60.00 | 0.00 | 90.00 | 63.60 | 52.00 |
| 29 | Gemini 2.0 Pro Experimental | 79.10 | 64.70 | 0.00 | 0.00 | 36.00 | 0.00 |
| 30 | Hunyuan-TurboS | 79.00 | 57.50 | 0.00 | 0.00 | 0.00 | 32.00 |
| 31 | Kimi K2.5 | 78.50 | 87.60 | 76.80 | 0.00 | 0.00 | 85.00 |
| 32 | ERNIE-4.5-300B-A47B | 78.40 | 0.00 | 0.00 | 96.40 | 54.80 | 38.80 |
| 33 | GPT-4o(2024-11-20) | 77.90 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 34 | Claude 3.5 Sonnet | 77.64 | 59.40 | 0.00 | 0.00 | 0.00 | 0.00 |
| 35 | Gemini 2.0 Flash Experimental | 76.24 | 65.20 | 21.40 | 0.00 | 0.00 | 29.10 |
| 36 | Qwen2.5-Max | 76.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 37 | DeepSeek-V3 | 75.90 | 59.10 | 0.00 | 87.80 | 39.00 | 34.60 |
| 38 | Grok 2 | 75.50 | 56.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 39 | Llama 4 Scout Instruct | 74.30 | 57.20 | 0.00 | 0.00 | 0.00 | 32.80 |
| 40 | Llama3.1-405B Instruct | 73.40 | 49.00 | 0.00 | 0.00 | 0.00 | 30.20 |
| 41 | Qwen3-235B-A22B | 72.90 | 71.10 | 34.40 | 98.00 | 85.70 | 70.70 |
| 42 | Gemini 2.0 Flash-Lite | 71.60 | 51.50 | 0.00 | 0.00 | 0.00 | 28.90 |
| 43 | Llama 4 Maverick | 62.90 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 44 | Llama3.1-405B | 61.60 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 45 | Llama 4 Scout | 58.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 46 | Mixtral-8x22B-Instruct-v0.1 | 56.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 47 | Grok-1.5 | 51.00 | 35.90 | 0.00 | 0.00 | 0.00 | 0.00 |
| 48 | Grok 3 mini | 0.00 | 65.00 | 0.00 | 0.00 | 40.00 | 0.00 |
| 49 | Codestral 25.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 37.90 |
| 50 | GPT-4.1 mini | 0.00 | 65.00 | 23.60 | 0.00 | 49.60 | 0.00 |
| 51 | GPT-4.1 nano | 0.00 | 50.30 | 0.00 | 0.00 | 29.40 | 0.00 |
| 52 | Grok 3.5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 53 | Step 3.5 Flash | 0.00 | 0.00 | 74.40 | 0.00 | 0.00 | 86.40 |
| 54 | Kimi K2 0905 | 0.00 | 0.00 | 69.20 | 0.00 | 0.00 | 0.00 |
| 55 | Qwen3-Coder-480B-A35B | 0.00 | 0.00 | 67.00 | 0.00 | 0.00 | 0.00 |
| 56 | Kimi k1.5 (Long-CoT) | 0.00 | 0.00 | 0.00 | 96.20 | 0.00 | 0.00 |
| 57 | Kimi k1.5 (Short-CoT) | 0.00 | 0.00 | 0.00 | 94.60 | 0.00 | 0.00 |
| 58 | Gemini 2.5 Pro Deep Think | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 80.40 |
| 59 | Kimi-k1.6-IOI-high | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 73.80 |
| 60 | OpenAI o3-mini (medium) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 67.40 |
| 61 | Kimi-k1.6-IOI | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 65.90 |
| 62 | QwQ-Max-Preview | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 65.60 |
| 63 | Gemini 2.5 Flash-Lite | 0.00 | 66.70 | 27.60 | 0.00 | 0.00 | 34.30 |
| 64 | Claude Sonnet 3.7 | 0.00 | 68.00 | 70.30 | 82.20 | 23.30 | 0.00 |
| 65 | Magistral-Medium-2506 | 0.00 | 70.83 | 0.00 | 0.00 | 73.59 | 59.36 |
| 66 | Step3 | 0.00 | 73.00 | 0.00 | 0.00 | 0.00 | 67.10 |
| 67 | ERNIE-4.5-VL-424B-A47B-Base | 0.00 | 76.80 | 0.00 | 0.00 | 0.00 | 38.80 |
| 68 | OpenAI o3-mini (high) | 0.00 | 79.70 | 49.30 | 97.90 | 87.00 | 69.50 |
| 69 | Grok 3 | 0.00 | 80.40 | 0.00 | 0.00 | 84.20 | 70.60 |
| 70 | DeepSeek V3.2 | 0.00 | 82.40 | 73.10 | 0.00 | 0.00 | 83.30 |
| 71 | Gemini 2.5 Flash | 0.00 | 82.80 | 50.00 | 0.00 | 88.00 | 55.40 |
| 72 | Gemini-2.5-Pro-Preview-05-06 | 0.00 | 83.00 | 63.20 | 98.80 | 92.00 | 77.10 |
| 73 | o3-pro | 0.00 | 84.00 | 75.00 | 0.00 | 93.00 | 0.00 |
| 74 | Grok-3 mini - Reasoning | 0.00 | 84.00 | 0.00 | 0.00 | 96.00 | 0.00 |
| 75 | Grok-3 - Reasoning Beta | 0.00 | 84.60 | 0.00 | 0.00 | 93.30 | 79.40 |
| 76 | Claude Sonnet 3.7-64K Extended Thinking | 0.00 | 84.80 | 0.00 | 96.20 | 80.00 | 0.00 |
| 77 | Amazon Nova Pro | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |