Qwen2.5-Max Benchmark Details

Qwen2.5-Max currently shows benchmark results led by MMLU (21 / 66, score 87.90), GSM8K (9 / 26, score 94.50), MBPP (10 / 28, score 80.60).

Benchmark Results

Qwen2.5-Max

Benchmark Results

General Knowledge

2 evaluations

Benchmark / mode

Score

Rank/total

MMLU

87.90

21 / 66

MMLU Pro

76.10

79 / 132

Math and Reasoning

3 evaluations

Benchmark / mode

Score

Rank/total

GSM8K

94.50

9 / 26

MATH

68.50

24 / 42

FrontierMath

52 / 60

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

MBPP

80.60

10 / 28

HumanEval

73.20

26 / 39

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Standard Mode

21.80

48 / 59

Compare with other models