Claude Sonnet 3.7 Benchmark Details

Claude Sonnet 3.7 currently shows benchmark results led by Aider-Polyglot (18 / 59, score 64.90), Simple Bench (31 / 63, score 46.40), GPQA Diamond (89 / 179, score 77). 1 source link is attached for reference.

Benchmark Results

Claude Sonnet 3.7

Benchmark Results

Thinking

General Knowledge

3 evaluations
Benchmark / mode
Score
Rank/total
77
89 / 179
68
123 / 179
10.30
135 / 161

Coding and Software Engineer

2 evaluations
Benchmark / mode
Score
Rank/total
70.30
55 / 108
62.30
74 / 108

Math and Reasoning

5 evaluations
Benchmark / mode
Score
Rank/total
82.20
41 / 44
54.80
84 / 106
23.30
58 / 62
4.10
41 / 60
3.10
46 / 60

常识推理

2 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Standard Mode
44.90
35 / 63
Simple Bench
Thinking Enabled
46.40
31 / 63

Agent Level Benchmark

5 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Standard Mode
60.40
21 / 59
64.90
18 / 59
61.80
29 / 40

Productivity Knowledge

1 evaluations
Benchmark / mode
Score
Rank/total
28
20 / 21

Long Context

1 evaluations
Benchmark / mode
Score
Rank/total
61
13 / 13

AI Agent - Tool Usage

1 evaluations
Benchmark / mode
Score
Rank/total

Sources