Artificial Analysis Intelligence Index

Name: Artificial Analysis Intelligence Index
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

Artificial Analysis Intelligence Index aggregates multiple rigorous benchmarks to compare AI model intelligence across coding, reasoning, science, tool use, and agentic tasks.

Top Model

MiniMax-M3

Top Score

Model Count

216

Data version

2026年06月28日

Data source: Artificial Analysis

Origin:All China

Leaderboard snapshot month:

Ranking Table

Rank	Model	Intelligence Index	Organization
13	MiniMax-M3MiniMax	44	MiniMax
14	DeepSeek-V4-Pro (max)DeepSeek-AI	44	DeepSeek-AI
18	Kimi K2.6Moonshot AI	43	Moonshot AI
21	Kimi K2.7 CodeKimi	42	Kimi
23	DeepSeek-V4-Pro (high)DeepSeek-AI	41	DeepSeek-AI
24	DeepSeek-V4-Flash (max)DeepSeek-AI	40	DeepSeek-AI
30	Qwen3.7 PlusAlibaba	39	Alibaba
32	MiniMax-M2.7MiniMaxAI	38	MiniMaxAI
36	DeepSeek-V4-Flash (high)DeepSeek-AI	37	DeepSeek-AI
46	Kimi K2.6Moonshot AI	35	Moonshot AI
56	DeepSeek-V4-ProDeepSeek-AI	31	DeepSeek-AI
63	Step 3.7 FlashStepFun	30	StepFun
68	DeepSeek-V4-FlashDeepSeek-AI	29	DeepSeek-AI
74	Step 3.5 FlashStepFunAI	26	StepFunAI
75	Doubao Seed CodeByteDance Seed	26	ByteDance Seed
103	Qwen3.5 4BAlibaba	20	Alibaba
121	Qwen3.5 4BAlibaba	16	Alibaba
148	Qwen3.5 2BAlibaba	10	Alibaba
153	Step3 VL 10BStepFun	9	StepFun
164	Qwen3.5 2BAlibaba	9	Alibaba
167	Kimi Linear 48B A3B InstructKimi	9	Kimi
186	Qwen3.5 0.8BAlibaba	5	Alibaba
192	Qwen3.5 0.8BAlibaba	4	Alibaba

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

Benchmark Components (Intelligence Index v4.0)

The Intelligence Index aggregates 10 rigorous benchmarks to provide a holistic measure of AI capabilities, preventing narrow specialization.

GDPval-AA

Agentic real-world tasks

τ²-Bench

Agentic tool use

Terminal-Bench

Agentic coding

SciCode

Coding proficiency

AA-LCR

Long context reasoning

AA-Omniscience

Knowledge & hallucination

IFBench

Instruction following

Humanity's Last Exam

Reasoning & knowledge

GPQA Diamond

Scientific reasoning

CritPt

Physics reasoning

FAQ

What is the Artificial Analysis Intelligence Index?▼

The Artificial Analysis Intelligence Index v4.0 is a composite benchmark that aggregates performance across 10 challenging evaluations — spanning mathematics, science, coding, agentic tasks, and reasoning — to measure AI capabilities holistically. It is designed to prevent narrow specialization and provide a single score for tracking progress.

How is the Intelligence Index calculated?▼

The index aggregates scores from 10 benchmarks: GDPval-AA (agentic real-world tasks), τ²-Bench (tool use), Terminal-Bench Hard (agentic coding), SciCode (coding), AA-LCR (long context reasoning), AA-Omniscience (knowledge & hallucination), IFBench (instruction following), Humanity's Last Exam (reasoning), GPQA Diamond (scientific reasoning), and CritPt (physics). All tests are independently run by Artificial Analysis on standardized hardware.

How does this differ from LMArena?▼

LMArena rankings are based on crowdsourced user votes (Elo ratings from blind A/B tests), reflecting subjective human preferences. The Artificial Analysis Intelligence Index uses standardized automated benchmarks with objective scoring, measuring technical capabilities across specific domains. Both perspectives are valuable — LMArena captures real-world user experience, while AA Intelligence Index provides reproducible technical measurements.

Where can I find the original data?▼

The original leaderboard and detailed methodology are available at artificialanalysis.ai. The Intelligence Index methodology is documented at Intelligence Index page.