Claude Opus 4 Benchmark Details

Claude Opus 4 currently shows benchmark results led by MATH-500 (3 / 44, score 98.20), MMLU Pro (25 / 126, score 85), Aider-Polyglot (13 / 59, score 72).

Benchmark Results

Claude Opus 4

Benchmark Results

Thinking

General Knowledge

5 evaluations
Benchmark / mode
Score
Rank/total
85
25 / 126
79.60
80 / 179
35.70
48 / 65
10.70
131 / 159
8.60
39 / 59

Coding and Software Engineer

2 evaluations
Benchmark / mode
Score
Rank/total
72.50
48 / 108
56.60
76 / 120

Math and Reasoning

9 evaluations
Benchmark / mode
Score
Rank/total
98.20
3 / 44
76
35 / 62
75.50
65 / 106
4.50
39 / 60
4.10
41 / 60
0
72 / 80
4.20
40 / 80
2.90
16 / 16

Writing and Creative Capabilities

1 evaluations
Benchmark / mode
Score
Rank/total
83.75
13 / 23

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Thinking Enabled
58.80
17 / 63

Agent Level Benchmark

3 evaluations
Benchmark / mode
Score
Rank/total
72.50
22 / 40
Aider-Polyglot
Standard Mode
70.70
16 / 59
72
13 / 59