Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index aggregates multiple rigorous benchmarks to compare AI model intelligence across coding, reasoning, science, tool use, and agentic tasks.
Top Model
Kimi K2.6
Top Score
54
Model Count
212
Data version
2026年05月10日
Data source: Artificial Analysis
Ranking Table
| Rank | Model | Intelligence Index | Organization |
|---|---|---|---|
| 6 | Kimi K2.6Moonshot AI | 54 | Moonshot AI |
| 14 | DeepSeek-V4-Pro (max)DeepSeek-AI | 52 | DeepSeek-AI |
| 18 | DeepSeek-V4-Pro (high)DeepSeek-AI | 50 | DeepSeek-AI |
| 20 | 50 | MiniMaxAI | |
| 25 | DeepSeek-V4-Flash (max)DeepSeek-AI | 47 | DeepSeek-AI |
| 30 | DeepSeek-V4-Flash (high)DeepSeek-AI | 45 | DeepSeek-AI |
| 36 | Kimi K2.6Moonshot AI | 43 | Moonshot AI |
| 39 | Hy3-previewTencent | 42 | Tencent |
| 46 | DeepSeek-V4-ProDeepSeek-AI | 39 | DeepSeek-AI |
| 51 | Step 3.5 FlashStepFunAI | 38 | StepFunAI |
| 55 | Kimi K2.5Moonshot AI | 37 | Moonshot AI |
| 58 | DeepSeek-V4-FlashDeepSeek-AI | 36 | DeepSeek-AI |
| 67 | Hy3-previewTencent | 34 | Tencent |
| 69 | Doubao Seed CodeByteDance Seed | 34 | ByteDance Seed |
| 97 | Qwen3.5 4BAlibaba | 27 | Alibaba |
| 98 | DeepSeek-R1-0528DeepSeek-AI | 27 | DeepSeek-AI |
| 118 | Qwen3.5 4BAlibaba | 23 | Alibaba |
| 142 | Qwen3.5 2BAlibaba | 16 | Alibaba |
| 145 | DeepSeek-R1-Distill-Llama-70BDeepSeek-AI | 16 | DeepSeek-AI |
| 149 | Step3 VL 10BStepFun | 15 | StepFun |
| 160 | Qwen3.5 2BAlibaba | 15 | Alibaba |
| 164 | Kimi Linear 48B A3B InstructKimi | 14 | Kimi |
| 183 | Qwen3.5 0.8BAlibaba | 11 | Alibaba |
| 189 | Qwen3.5 0.8BAlibaba | 10 | Alibaba |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
Benchmark Components (Intelligence Index v4.0)
The Intelligence Index aggregates 10 rigorous benchmarks to provide a holistic measure of AI capabilities, preventing narrow specialization.
GDPval-AA
Agentic real-world tasks
τ²-Bench
Agentic tool use
Terminal-Bench
Agentic coding
SciCode
Coding proficiency
AA-LCR
Long context reasoning
AA-Omniscience
Knowledge & hallucination
IFBench
Instruction following
Humanity's Last Exam
Reasoning & knowledge
GPQA Diamond
Scientific reasoning
CritPt
Physics reasoning
FAQ
What is the Artificial Analysis Intelligence Index?▼
The Artificial Analysis Intelligence Index v4.0 is a composite benchmark that aggregates performance across 10 challenging evaluations — spanning mathematics, science, coding, agentic tasks, and reasoning — to measure AI capabilities holistically. It is designed to prevent narrow specialization and provide a single score for tracking progress.
How is the Intelligence Index calculated?▼
The index aggregates scores from 10 benchmarks: GDPval-AA (agentic real-world tasks), τ²-Bench (tool use), Terminal-Bench Hard (agentic coding), SciCode (coding), AA-LCR (long context reasoning), AA-Omniscience (knowledge & hallucination), IFBench (instruction following), Humanity's Last Exam (reasoning), GPQA Diamond (scientific reasoning), and CritPt (physics). All tests are independently run by Artificial Analysis on standardized hardware.
How does this differ from LMArena?▼
LMArena rankings are based on crowdsourced user votes (Elo ratings from blind A/B tests), reflecting subjective human preferences. The Artificial Analysis Intelligence Index uses standardized automated benchmarks with objective scoring, measuring technical capabilities across specific domains. Both perspectives are valuable — LMArena captures real-world user experience, while AA Intelligence Index provides reproducible technical measurements.
Where can I find the original data?▼
The original leaderboard and detailed methodology are available at artificialanalysis.ai. The Intelligence Index methodology is documented at Intelligence Index page.





