GPT-5.2vsGPT-5.1

Across 16 shared benchmarks, GPT-5.2 leads overall: GPT-5.2 wins 15, GPT-5.1 wins 1, with 0 ties and an average score difference of +9.54.

OpenAI · 2025-12-11 · Chat model

OpenAI · 2025-11-12 · Reasoning model

GPT-5.215 wins(94%)(6%)1 winGPT-5.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 16 shared benchmarks.

GPT-5.2 5/5

Benchmark	GPT-5.2	GPT-5.1	Diff
ARC-AGI-2	54.2023 / 62深度思考（无工具、并行）	17.6036 / 62	+36.60
HLE	45.5041 / 172Deep Thinking (With Tools + Internet)	26.5097 / 172	+19
ARC-AGI	90.5017 / 68深度思考（无工具、并行）	72.8028 / 68	+17.70
LiveBench	48.9194 / 115Normal (No Tools)	42.65106 / 115Normal (No Tools)	+6.26
GPQA Diamond	93.209 / 187深度思考（无工具、并行）	88.1031 / 187	+5.10

GPT-5.2 3/3

Benchmark	GPT-5.2	GPT-5.1	Diff
IC SWE-Lancer(Diamond)	74.602 / 8极高强度思考（工具）	69.703 / 8Thinking High (No Tools)	+4.90
SWE-Bench Pro - Public	55.6025 / 54极高强度思考（工具）	50.8040 / 54Thinking High (No Tools)	+4.80
SWE-bench Verified	8017 / 112极高强度思考（工具）	76.3034 / 112	+3.70

GPT-5.2 3/3

Benchmark	GPT-5.2	GPT-5.1	Diff
FrontierMath	40.308 / 60极高强度思考（工具）	26.7013 / 60Thinking High (With Tools)	+13.60
FrontierMath - Tier 4	18.8016 / 80Thinking High (No Tools)	12.5029 / 80Thinking High (With Tools)	+6.30
AIME2025	1001 / 107极高强度思考（无工具）	9428 / 107	+6

GPT-5.2 1/1

Benchmark	GPT-5.2	GPT-5.1	Diff
τ²-Bench - Telecom	98.704 / 35极高强度思考（工具）	95.6014 / 35Thinking High (With Tools)	+3.10

GPT-5.2 1/1

Benchmark	GPT-5.2	GPT-5.1	Diff
BrowseComp	65.8031 / 53Deep Thinking (With Tools + Internet)	50.8043 / 53Thinking High (No Tools)	+15

GPT-5.2 1/1

Benchmark	GPT-5.2	GPT-5.1	Diff
MCP-Atlas	67.6018 / 27极高强度思考（工具）	50.1025 / 27Thinking High (With Tools)	+17.50

GPT-5.1 1/1

Benchmark	GPT-5.2	GPT-5.1	Diff
Simple Bench	45.8033 / 63Thinking High (No Tools)	53.2023 / 63Thinking High (No Tools)	-7.40

GPT-5.2 1/1

Benchmark	GPT-5.2	GPT-5.1	Diff
MMMU	85.901 / 29极高强度思考（无工具）	85.402 / 29	+0.50

Prices use DataLearner records when available; missing fields are not inferred.

GPT-5.2leads in:General Knowledge (5/5), Coding and Software Engineer (3/3), Math and Reasoning (3/3), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Multimodal Understanding (1/1)
GPT-5.1leads in:Commonsense Reasoning (1/1)

On average across the 16 shared benchmarks, GPT-5.2 scores 9.54 higher.

Largest single-benchmark gap: ARC-AGI-2 — GPT-5.2 54.20 vs GPT-5.1 17.60 (+36.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.