Claude 3.5 Sonnet New Benchmark Details

Claude 3.5 Sonnet New currently shows benchmark results led by HumanEval (3 / 39, score 93.70), BBH (2 / 20, score 92.60), MMLU (18 / 65, score 88.30).

Benchmark Results

Claude 3.5 Sonnet New

Benchmark Results

General Knowledge

4 evaluations

Benchmark / mode

Score

Rank/total

BBH

92.60

2 / 20

MMLU

88.30

18 / 65

MMLU Pro

69 / 126

GPQA Diamond

132 / 179

Coding and Software Engineer

3 evaluations

Benchmark / mode

Score

Rank/total

HumanEval

93.70

3 / 39

SWE-bench Verified

93 / 108

LiveCodeBench

38.70

102 / 120

Math and Reasoning

5 evaluations

Benchmark / mode

Score

Rank/total

MATH

78.30

12 / 42

MATH-500

42 / 44

AIME 2024

59 / 62

FrontierMath

2.10

47 / 60

FrontierMath - Tier 4

Standard Mode

72 / 80

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

28.40

24 / 45

Writing and Creative Capabilities

1 evaluations

Benchmark / mode

Score

Rank/total

Creative Writing

78.15

20 / 23

常识推理

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Standard Mode

41.40

36 / 63

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Standard Mode

51.60

32 / 59

Compare with other models