GPT-5.2 Benchmark Details

GPT-5.2 currently shows benchmark results led by AIME2025 (1 / 106, score 100), MMMU (1 / 28, score 85.90), GPQA Diamond (8 / 179, score 93.20). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 2 source links are attached for reference.

Benchmark Results

GPT-5.2

Benchmark Results

Thinking
Tool usage
Internet
Parallel

General Knowledge

19 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Extra-High
92.40
11 / 179
GPQA Diamond
Deep Thinking Mode
93.20
8 / 179
55.70
41 / 65
ARC-AGI
Medium
72.70
26 / 65
78.70
22 / 65
ARC-AGI
Extra-High
86.20
18 / 65
ARC-AGI
Deep Thinking Mode
90.50
15 / 65
MMLU
Extra-High
89.60
11 / 65
LiveBench
Standard Mode
48.91
94 / 115
65.33
53 / 115
LiveBench
Medium
71.84
31 / 115
74.84
19 / 115
9.70
38 / 59
ARC-AGI-2
Medium
26.70
31 / 59
43.30
24 / 59
ARC-AGI-2
Extra-High
52.90
22 / 59
ARC-AGI-2
Deep Thinking Mode
54.20
20 / 59
HLE
Extra-High
34.50
66 / 159
HLE
Extra-HighToolsInternet
45.50
33 / 159

Coding and Software Engineer

3 evaluations
Benchmark / mode
Score
Rank/total
SWE-bench Verified
Extra-HighTools
80
16 / 108
IC SWE-Lancer(Diamond)
Extra-HighTools
74.60
2 / 8
SWE-Bench Pro - Public
Extra-HighTools
55.60
18 / 44

Math and Reasoning

7 evaluations
Benchmark / mode
Score
Rank/total
AIME2025
Extra-High
100
1 / 106
FrontierMath
Extra-HighTools
40.30
8 / 60
6.30
35 / 80
16.70
20 / 80
18.80
16 / 80
18.80
16 / 80
FrontierMath - Tier 4
Extra-HighTools
14.60
23 / 80

Multimodal Understanding

2 evaluations
Benchmark / mode
Score
Rank/total
MMMU
Extra-High
85.90
1 / 28
MMMU
Extra-HighTools
80.40
12 / 28

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
45.80
33 / 63

Agent Level Benchmark

2 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench - Telecom
Extra-HighTools
98.70
4 / 35
τ²-Bench
Extra-HighTools
82
12 / 40

AI Agent - Information Search

2 evaluations
Benchmark / mode
Score
Rank/total
BrowseComp
Extra-HighToolsInternet
65.80
24 / 45
BrowseComp
Extra-HighTools
65.80
24 / 45

Productivity Knowledge

2 evaluations
Benchmark / mode
Score
Rank/total
GDPval-AA
HighTools
70.90
9 / 21
GDPval-AA
Extra-HighTools
61
10 / 21

AI Agent - Tool Usage

1 evaluations
Benchmark / mode
Score
Rank/total
MCP-Atlas
Extra-HighTools
67.60
14 / 23

Competitor Comparison

Benchmark scores for GPT-5.2 compared against top models in its class

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

BenchmarkGPT-5.2CurrentGemini 3.0 Pro (Preview 11-2025)Opus 4.5
ARC-AGI
综合评估
90.50Deep Thinking Mode
87.50Thinking Enabled
--
ARC-AGI-2
综合评估
54.20Deep Thinking Mode
45.10Thinking Enabled
--
GPQA Diamond
综合评估
93.20Deep Thinking Mode
93.80Thinking Enabled
--
HLE
综合评估
45.50Deep Thinking Mode | Tools
45.80Thinking Level · High | Tools
43.20Extended Thinking | Tools
LiveBench
综合评估
48.91Standard Mode
73.39Thinking Level · High
75.9664K
SWE-bench Verified
编程与软件工程
80.00Thinking Level · Extra High | Tools
76.20Thinking Enabled
80.90Extended Thinking | Tools
FrontierMath
数学推理
40.30Thinking Level · Extra High | Tools
38.00Thinking Enabled
--
18.80Thinking Level · Extra High
18.80Thinking Enabled
4.20Standard Mode
τ²-Bench
Agent能力评测
82.00Thinking Level · Extra High | Tools
85.40Thinking Enabled | Tools
81.99Extended Thinking | Tools
τ²-Bench - Telecom
Agent能力评测
98.70Thinking Level · Extra High | Tools
98.00Thinking Level · High | Tools
90.70Extended Thinking | Tools
BrowseComp
AI Agent - 信息收集
65.80Thinking Level · Extra High | Tools
59.20Thinking Level · High | Tools
--
GDPval-AA
生产力知识
70.90Thinking Level · High | Tools
35.00Thinking Level · High
--

Standard API Pricing: GPT-5.2 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.2
Facebook AI研究实验室$1.75 / 1M tokens$14 / 1M tokens
Opus 4.5
Facebook AI研究实验室$5 / 1M tokens$25 / 1M tokens

Version History

How each version of the GPT-5.2 series stacks up on benchmark tests

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkGPT-5.2CurrentGPT-5.1GPT-5
ARC-AGI
综合评估
90.50Deep Thinking Mode
72.80Thinking Level · High
65.70Thinking Level · High
ARC-AGI-2
综合评估
54.20Deep Thinking Mode
17.60Thinking Level · High
9.90Thinking Level · High
GPQA Diamond
综合评估
93.20Deep Thinking Mode
88.10Thinking Enabled
87.30Thinking Enabled | Tools
HLE
综合评估
45.50Deep Thinking Mode | Tools
42.70Thinking Level · High | Tools
35.20Thinking Enabled | Tools
LiveBench
综合评估
48.91Standard Mode
69.17Thinking Level · Medium
--
SWE-Bench Pro - Public
编程与软件工程
55.60Thinking Level · Extra High | Tools
--
36.30Thinking Level · High
SWE-bench Verified
编程与软件工程
80.00Thinking Level · Extra High | Tools
76.30Thinking Level · High
72.80Thinking Level · High
FrontierMath
数学推理
40.30Thinking Level · Extra High | Tools
26.70Thinking Level · High | Tools
26.30Thinking Level · High | Tools
18.80Thinking Level · Extra High
12.50Thinking Level · High | Tools
12.50Thinking Level · High
MMMU
多模态理解
80.40Thinking Level · Extra High | Tools
85.40Thinking Level · High
84.20Thinking Level · High
τ²-Bench
Agent能力评测
82.00Thinking Level · Extra High | Tools
--
80.00Thinking Enabled | Tools
τ²-Bench - Telecom
Agent能力评测
98.70Thinking Level · Extra High | Tools
95.60Thinking Level · High | Tools
96.70Thinking Level · High | Tools
1 additional benchmarks remain in the chart above.

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.2 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.2
Facebook AI研究实验室$1.75 / 1M tokens$14 / 1M tokens

Sources