Grok-3 - Reasoning Beta Benchmark Details

Grok-3 - Reasoning Beta currently shows benchmark results led by AIME 2024 (6 / 62, score 93.30), LiveCodeBench (33 / 120, score 79.40), GPQA Diamond (52 / 179, score 84.60).

Benchmark Results

Grok-3 - Reasoning Beta

Benchmark Results

General Knowledge

1 evaluations

Benchmark / mode

Score

Rank/total

84.60

52 / 179

Math and Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

93.30

6 / 62

Coding and Software Engineer

1 evaluations

Benchmark / mode

Score

Rank/total

79.40

33 / 120

Compare with other models