Grok 3 Benchmark Details

Grok 3 currently shows benchmark results led by AIME 2024 (22 / 62, score 84.20), SimpleQA (16 / 45, score 43.40), GPQA Diamond (75 / 179, score 80.40).

Benchmark Results

Grok 3

Benchmark Results

Thinking

General Knowledge

1 evaluations
Benchmark / mode
Score
Rank/total
80.40
75 / 179

Common Sense

1 evaluations
Benchmark / mode
Score
Rank/total
43.40
16 / 45

Math and Reasoning

4 evaluations
Benchmark / mode
Score
Rank/total
84.20
22 / 62
77.10
62 / 106
3.80
45 / 60
0
72 / 80

Coding and Software Engineer

1 evaluations
Benchmark / mode
Score
Rank/total
70.60
52 / 120

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Standard Mode
36.10
44 / 63

Agent Level Benchmark

1 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Standard Mode
53.30
30 / 59