GPT-5.1vsGPT-5

Across 13 shared benchmarks, GPT-5.1 leads overall: GPT-5.1 wins 7, GPT-5 wins 5, with 1 ties and an average score difference of +1.12.

OpenAI · 2025-11-12 · Reasoning model

OpenAI · 2025-08-07 · Foundation model

GPT-5.17 wins(54%)Ties1(38%)5 winsGPT-5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 13 shared benchmarks.

GPT-5.1 3/4

Benchmark	GPT-5.1	GPT-5	Diff
HLE	26.5097 / 172	35.2073 / 172	-8.70
ARC-AGI-2	17.6036 / 62	9.9040 / 62	+7.70
ARC-AGI	72.8028 / 68	65.7033 / 68	+7.10
GPQA Diamond	88.1031 / 187	87.3040 / 187	+0.80

Even 3/3

Benchmark	GPT-5.1	GPT-5	Diff
AIME2025	9428 / 107	99.609 / 107	-5.60
FrontierMath	26.7013 / 60Thinking High (With Tools)	24.8015 / 60	+1.90
FrontierMath - Tier 4	12.5029 / 80Thinking High (With Tools)	12.5029 / 80Thinking High (No Tools)	—

GPT-5.1 2/2

Benchmark	GPT-5.1	GPT-5	Diff
SWE-Bench Pro - Public	50.8040 / 54Thinking High (No Tools)	36.3052 / 54	+14.50
SWE-bench Verified	76.3034 / 112	72.8050 / 112	+3.50

GPT-5 1/1

Benchmark	GPT-5.1	GPT-5	Diff
τ²-Bench - Telecom	95.6014 / 35Thinking High (With Tools)	95.8013 / 35	-0.20

GPT-5 1/1

Benchmark	GPT-5.1	GPT-5	Diff
BrowseComp	50.8043 / 53Thinking High (No Tools)	54.9039 / 53	-4.10

GPT-5 1/1

Benchmark	GPT-5.1	GPT-5	Diff
Simple Bench	53.2023 / 63Thinking High (No Tools)	56.7020 / 63Thinking High (No Tools)	-3.50

GPT-5.1 1/1

Benchmark	GPT-5.1	GPT-5	Diff
MMMU	85.402 / 29	84.206 / 29	+1.20

Prices use DataLearner records when available; missing fields are not inferred.

GPT-5.1leads in:General Knowledge (3/4), Coding and Software Engineer (2/2), Multimodal Understanding (1/1)
GPT-5leads in:Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), Commonsense Reasoning (1/1)
Tied in:Math and Reasoning

On average across the 13 shared benchmarks, GPT-5.1 scores 1.12 higher.

Largest single-benchmark gap: SWE-Bench Pro - Public — GPT-5.1 50.80 vs GPT-5 36.30 (+14.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.