Grok 4 Benchmark Details

Grok 4 currently shows benchmark results led by IMO 2024 (1 / 10, score 23.20), IMO 2025 (1 / 9, score 29.20), MMLU Pro (14 / 126, score 87).

Benchmark Results

Grok 4

Benchmark Results

General Knowledge

8 evaluations

Benchmark / mode

Score

Rank/total

MMLU Pro

14 / 126

GPQA Diamond

39 / 179

ARC-AGI

66.70

29 / 65

LiveBench

Standard Mode

62.02

59 / 115

HLE

38.60

55 / 159

HLE

38.60

55 / 159

HLE

25.40

88 / 159

ARC-AGI-2

15.90

34 / 59

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

25 / 120

SWE-bench Verified

58.60

79 / 108

Math and Reasoning

9 evaluations

Benchmark / mode

Score

Rank/total

AIME2025

98.80

13 / 106

AIME2025

91.70

36 / 106

IMO-ProofBench

46.70

4 / 16

IMO-ProofBench

23.30

10 / 16

IMO 2025

29.20

1 / 9

IMO 2024

23.20

1 / 10

IMO-ProofBench Advanced

18.60

3 / 8

FrontierMath

12.10

22 / 60

FrontierMath - Tier 4

Standard Mode

2.10

56 / 80

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal-Bench

13 / 35

常识推理

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Thinking Enabled

60.50

15 / 63

Agent Level Benchmark

2 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Thinking Level · High

79.60

7 / 59

τ²-Bench - Telecom

26 / 35

Compare with other models