Claude Sonnet 3.7 Benchmark Details

Claude Sonnet 3.7 currently shows benchmark results led by LiveBench (24 / 52, score 68.64), GPQA Diamond (89 / 179, score 77), SWE-bench Verified (55 / 108, score 70.30). 1 source link is attached for reference.

Benchmark Results

Claude Sonnet 3.7

Benchmark Results

Thinking

General Knowledge

5 evaluations
Benchmark / mode
Score
Rank/total
77
89 / 179
68
123 / 179
68.64
24 / 52
60.40
43 / 52
10.30
133 / 159

Coding and Software Engineer

2 evaluations
Benchmark / mode
Score
Rank/total
70.30
55 / 108
62.30
74 / 108

Math and Reasoning

5 evaluations
Benchmark / mode
Score
Rank/total
82.20
41 / 44
54.80
84 / 106
23.30
58 / 62
4.10
41 / 60
3.10
46 / 60

常识推理

2 evaluations
Benchmark / mode
Score
Rank/total
46.40
14 / 27
44.90
16 / 27

Agent Level Benchmark

5 evaluations
Benchmark / mode
Score
Rank/total
64.90
15 / 26
60.40
18 / 26
61.80
29 / 40

Productivity Knowledge

1 evaluations
Benchmark / mode
Score
Rank/total
28
20 / 21

Long Context

1 evaluations
Benchmark / mode
Score
Rank/total
61
13 / 13

AI Agent - Tool Usage

1 evaluations
Benchmark / mode
Score
Rank/total

Sources