Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index aggregates multiple rigorous benchmarks to compare AI model intelligence across coding, reasoning, science, tool use, and agentic tasks.

Top Model

MiniMax-M3

Top Score

44

Model Count

216

Data version

2026年06月28日

Data source: Artificial Analysis

Origin:AllChina
Leaderboard snapshot month:

Ranking Table

RankModelIntelligence IndexOrganization
13MiniMaxMiniMax-M3MiniMax44MiniMax
14DeepSeek-AIDeepSeek-V4-Pro (max)DeepSeek-AI44DeepSeek-AI
18Moonshot AIKimi K2.6Moonshot AI43Moonshot AI
21KimiKimi K2.7 CodeKimi42Kimi
23DeepSeek-AIDeepSeek-V4-Pro (high)DeepSeek-AI41DeepSeek-AI
24DeepSeek-AIDeepSeek-V4-Flash (max)DeepSeek-AI40DeepSeek-AI
30AlibabaQwen3.7 PlusAlibaba39Alibaba
32MiniMaxAIMiniMax-M2.7MiniMaxAI38MiniMaxAI
36DeepSeek-AIDeepSeek-V4-Flash (high)DeepSeek-AI37DeepSeek-AI
46Moonshot AIKimi K2.6Moonshot AI35Moonshot AI
56DeepSeek-AIDeepSeek-V4-ProDeepSeek-AI31DeepSeek-AI
63StepFunStep 3.7 FlashStepFun30StepFun
68DeepSeek-AIDeepSeek-V4-FlashDeepSeek-AI29DeepSeek-AI
74StepFunAIStep 3.5 FlashStepFunAI26StepFunAI
75ByteDance SeedDoubao Seed CodeByteDance Seed26ByteDance Seed
103AlibabaQwen3.5 4BAlibaba20Alibaba
121AlibabaQwen3.5 4BAlibaba16Alibaba
148AlibabaQwen3.5 2BAlibaba10Alibaba
153StepFunStep3 VL 10BStepFun9StepFun
164AlibabaQwen3.5 2BAlibaba9Alibaba
167KimiKimi Linear 48B A3B InstructKimi9Kimi
186AlibabaQwen3.5 0.8BAlibaba5Alibaba
192AlibabaQwen3.5 0.8BAlibaba4Alibaba

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

Benchmark Components (Intelligence Index v4.0)

The Intelligence Index aggregates 10 rigorous benchmarks to provide a holistic measure of AI capabilities, preventing narrow specialization.

GDPval-AA
Agentic real-world tasks
τ²-Bench
Agentic tool use
Terminal-Bench
Agentic coding
SciCode
Coding proficiency
AA-LCR
Long context reasoning
AA-Omniscience
Knowledge & hallucination
IFBench
Instruction following
Humanity's Last Exam
Reasoning & knowledge
GPQA Diamond
Scientific reasoning
CritPt
Physics reasoning

FAQ

What is the Artificial Analysis Intelligence Index?
The Artificial Analysis Intelligence Index v4.0 is a composite benchmark that aggregates performance across 10 challenging evaluations — spanning mathematics, science, coding, agentic tasks, and reasoning — to measure AI capabilities holistically. It is designed to prevent narrow specialization and provide a single score for tracking progress.
How is the Intelligence Index calculated?
The index aggregates scores from 10 benchmarks: GDPval-AA (agentic real-world tasks), τ²-Bench (tool use), Terminal-Bench Hard (agentic coding), SciCode (coding), AA-LCR (long context reasoning), AA-Omniscience (knowledge & hallucination), IFBench (instruction following), Humanity's Last Exam (reasoning), GPQA Diamond (scientific reasoning), and CritPt (physics). All tests are independently run by Artificial Analysis on standardized hardware.
How does this differ from LMArena?
LMArena rankings are based on crowdsourced user votes (Elo ratings from blind A/B tests), reflecting subjective human preferences. The Artificial Analysis Intelligence Index uses standardized automated benchmarks with objective scoring, measuring technical capabilities across specific domains. Both perspectives are valuable — LMArena captures real-world user experience, while AA Intelligence Index provides reproducible technical measurements.
Where can I find the original data?
The original leaderboard and detailed methodology are available at artificialanalysis.ai. The Intelligence Index methodology is documented at Intelligence Index page.