GPT-5vsGPT-4.1

Across 8 shared benchmarks, GPT-5 leads overall: GPT-5 wins 8, GPT-4.1 wins 0, with 0 ties and an average score difference of +27.55.

OpenAI
GPT-5

OpenAI · 2025-08-07 · Foundation model

OpenAI
GPT-4.1

OpenAI · 2025-04-14 · Chat model

GPT-58 wins(100%)(0%)0 winsGPT-4.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

Math and Reasoning

GPT-5 4/4
BenchmarkGPT-5GPT-4.1Diff
AIME202599.609 / 10636.7097 / 106+62.90
Simple Bench56.708 / 272723 / 27+29.70
FrontierMath24.8015 / 605.5037 / 60+19.30
FrontierMath - Tier 412.5029 / 80Thinking High (No Tools)072 / 80Normal (No Tools)+12.50

General Knowledge

GPT-5 2/2
BenchmarkGPT-5GPT-4.1Diff
HLE35.2060 / 1573.70156 / 157+31.50
GPQA Diamond87.3037 / 17866.30126 / 178+21

Agent Level Benchmark

GPT-5 1/1
BenchmarkGPT-5GPT-4.1Diff
τ²-Bench8015 / 4054.7031 / 40+25.30

Coding and Software Engineer

GPT-5 1/1
BenchmarkGPT-5GPT-4.1Diff
SWE-bench Verified72.8046 / 10854.6084 / 108+18.20

Specs

FieldGPT-5GPT-4.1
PublisherOpenAIOpenAI
Release date2025-08-072025-04-14
Model typeFoundation modelChat model
ArchitectureDenseDense
ParametersNot availableNot available
Context length400K1024K
Max output128K32K

Summary

  • GPT-5leads in:Math and Reasoning (4/4), General Knowledge (2/2), Agent Level Benchmark (1/1), Coding and Software Engineer (1/1)

On average across the 8 shared benchmarks, GPT-5 scores 27.55 higher.

Largest single-benchmark gap: AIME2025 — GPT-5 99.60 vs GPT-4.1 36.70 (+62.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.