GPT-4o mini Benchmark Details

GPT-4o mini currently shows benchmark results led by MBPP (4 / 28, score 87.20), MATH (19 / 42, score 70.20), GSM8K (12 / 26, score 91.30).

Benchmark Results

GPT-4o mini

Benchmark Results

General Knowledge

4 evaluations

Benchmark / mode

Score

Rank/total

MMLU

46 / 65

MMLU Pro

61.70

104 / 126

GPQA Diamond

41.10

164 / 179

GPQA

40.20

11 / 14

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

GSM8K

91.30

12 / 26

MATH

70.20

19 / 42

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

HumanEval

87.20

19 / 39

MBPP

87.20

4 / 28

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

9.50

41 / 45

常识推理

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Standard Mode

10.70

63 / 63

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Standard Mode

3.60

59 / 59

Claw-style Agent Evaluation

1 evaluations

Benchmark / mode

Score

Rank/total

Pinch Bench

Thinking EnabledTools

27 / 37

Compare with other models