Phi-4-mini-instruct (3.8B) Benchmark Details

Phi-4-mini-instruct (3.8B) currently shows benchmark results led by GSM8K (14 / 26, score 88.60), HumanEval (24 / 39, score 74.40), MATH (27 / 42, score 64).

Benchmark Results

Phi-4-mini-instruct (3.8B)

Benchmark Results

General Knowledge

3 evaluations

Benchmark / mode

Score

Rank/total

MMLU

67.30

61 / 66

MMLU Pro

52.80

118 / 132

GPQA Diamond

175 / 187

Math and Reasoning

4 evaluations

Benchmark / mode

Score

Rank/total

GSM8K

88.60

14 / 26

MATH-500

71.80

44 / 44

MATH

27 / 42

AIME 2024

60 / 62

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

HumanEval

74.40

24 / 39

MBPP

65.30

20 / 28

Compare with other models