GPT-5vsGPT-4.1

Across 8 shared benchmarks, GPT-5 leads overall: GPT-5 wins 8, GPT-4.1 wins 0, with 0 ties and an average score difference of +27.55.

OpenAI · 2025-08-07 · Foundation model

OpenAI · 2025-04-14 · Chat model

GPT-58 wins(100%)(0%)0 winsGPT-4.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

GPT-5 4/4

Benchmark	GPT-5	GPT-4.1	Diff
AIME2025	99.609 / 106	36.7097 / 106	+62.90
Simple Bench	56.708 / 27	2723 / 27	+29.70
FrontierMath	24.8015 / 60	5.5037 / 60	+19.30
FrontierMath - Tier 4	12.5029 / 80Thinking High (No Tools)	072 / 80Normal (No Tools)	+12.50

GPT-5 2/2

Benchmark	GPT-5	GPT-4.1	Diff
HLE	35.2060 / 157	3.70156 / 157	+31.50
GPQA Diamond	87.3037 / 178	66.30126 / 178	+21

GPT-5 1/1

Benchmark	GPT-5	GPT-4.1	Diff
τ²-Bench	8015 / 40	54.7031 / 40	+25.30

GPT-5 1/1

Benchmark	GPT-5	GPT-4.1	Diff
SWE-bench Verified	72.8046 / 108	54.6084 / 108	+18.20

GPT-5leads in:Math and Reasoning (4/4), General Knowledge (2/2), Agent Level Benchmark (1/1), Coding and Software Engineer (1/1)

On average across the 8 shared benchmarks, GPT-5 scores 27.55 higher.

Largest single-benchmark gap: AIME2025 — GPT-5 99.60 vs GPT-4.1 36.70 (+62.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.