ARC
Updated Jul 22, 2025·827 views
- Problem Count
- 7787
- Institution
- Allen Institute for AI
- Category
- 常识推理
- Metrics
- Accuracy
- Language
- 英语
- Difficulty
- Advanced
Overview
一个包含 7787 个多项选择题的基准,用于评估模型的常识推理能力。
Related resources
Latest ARC model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for ARC.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
ARC Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() Gemma 2 - 9B Standard Mode | 68.20 | 2024-06-27 | 9B | Free Commercial | |
![]() Qwen2.5-7B Standard Mode | 63.70 | 2024-09-18 | 7B | Free Commercial | |
![]() Mistral-7B-Instruct-v0.3 Standard Mode | 60.00 | 2024-05-22 | 7B | Free Commercial | |
4 | ![]() Llama3.1-8B Standard Mode | 59.30 | 2024-07-23 | 8B | Free Commercial |



