加载中...
加载中...
Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.
Detailed benchmark descriptions available at:LLM Benchmark List & Guide
Benchmark switcher
Pick the leaderboard to sync both chart and table
Data source: DataLearnerAI
| 97.40 |
| 79.40 |
| 61.80 |
| 3 | GLM-4-9B-Chat | 72.40 | 0.00 | 0.00 | 0.00 | 76.40 | 51.80 |
| 4 | Qwen2.5-7B | 45.00 | 36.40 | 0.00 | 0.00 | 0.00 | 0.00 |
| 5 | Gemma 2 - 9B | 44.70 | 32.80 | 0.00 | 0.00 | 0.00 | 0.00 |
| 6 | Llama3.1-8B-Instruct | 44.00 | 26.30 | 0.00 | 0.00 | 0.00 | 0.00 |
| 7 | Llama3.1-8B | 35.40 | 25.80 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8 | Mistral-7B-Instruct-v0.3 | 30.90 | 24.70 | 0.00 | 0.00 | 0.00 | 0.00 |
| 9 | Qwen3-4B-Thinking-2507 | 0.00 | 65.80 | 0.00 | 0.00 | 0.00 | 55.20 |
| 10 | Qwen3-4B-2507 | 0.00 | 62.00 | 0.00 | 0.00 | 0.00 | 35.10 |
| 11 | Hunyuan-7B | 0.00 | 60.10 | 0.00 | 93.70 | 81.10 | 57.00 |
| 12 | DeepSeek-R1-Distill-Qwen-7B | 0.00 | 49.50 | 0.00 | 91.40 | 53.30 | 0.00 |