Phi-4-instruct (reasoning-trained) Benchmark Details

Phi-4-instruct (reasoning-trained) currently shows benchmark results led by AIME 2024 (46 / 62, score 50), MATH-500 (36 / 44, score 90.40), GPQA Diamond (156 / 179, score 49).

Benchmark Results

Phi-4-instruct (reasoning-trained)

Benchmark Results

Thinking

General Knowledge

1 evaluations
Benchmark / mode
Score
Rank/total
49
156 / 179

Math and Reasoning

2 evaluations
Benchmark / mode
Score
Rank/total
90.40
36 / 44
50
46 / 62