Claude 3.5 Sonnet New Benchmark Details

Claude 3.5 Sonnet New currently shows benchmark results led by HumanEval (3 / 39, score 93.70), BBH (2 / 20, score 92.60), MMLU (18 / 65, score 88.30).

Benchmark Results

Claude 3.5 Sonnet New

Benchmark Results

Thinking

General Knowledge

4 evaluations
Benchmark / mode
Score
Rank/total
92.60
2 / 20
88.30
18 / 65
78
69 / 126
65
132 / 179

Coding and Software Engineer

3 evaluations
Benchmark / mode
Score
Rank/total
93.70
3 / 39
38.70
102 / 120

Math and Reasoning

5 evaluations
Benchmark / mode
Score
Rank/total
78.30
12 / 42
78
42 / 44
16
59 / 62
2.10
47 / 60
0
72 / 80

Common Sense

1 evaluations
Benchmark / mode
Score
Rank/total
28.40
24 / 45

Writing and Creative Capabilities

1 evaluations
Benchmark / mode
Score
Rank/total
78.15
20 / 23

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Standard Mode
41.40
36 / 63

Agent Level Benchmark

1 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Standard Mode
51.60
32 / 59