GPT-5.2 Benchmark Details

GPT-5.2 currently shows benchmark results led by AIME2025 (1 / 106, score 100), MMMU (1 / 28, score 85.90), GPQA Diamond (8 / 179, score 93.20). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 2 source links are attached for reference.

Benchmark Results

GPT-5.2

Benchmark Results

General Knowledge

19 evaluations

Benchmark / mode

Score

Rank/total

Extra-High

92.40

11 / 179

Deep Thinking Mode

93.20

8 / 179

Low

55.70

41 / 65

Medium

72.70

26 / 65

High

78.70

22 / 65

Extra-High

86.20

18 / 65

Deep Thinking Mode

90.50

15 / 65

Extra-High

89.60

11 / 65

Standard Mode

48.91

94 / 115

Low

65.33

53 / 115

Medium

71.84

31 / 115

High

74.84

19 / 115

Low

9.70

38 / 59

Medium

26.70

31 / 59

High

43.30

24 / 59

Extra-High

52.90

22 / 59

Deep Thinking Mode

54.20

20 / 59

Extra-High

34.50

66 / 159

Extra-HighToolsInternet

45.50

33 / 159

Coding and Software Engineer

3 evaluations

Benchmark / mode

Score

Rank/total

SWE-bench Verified

Extra-HighTools

80

16 / 108

IC SWE-Lancer(Diamond)

Extra-HighTools

74.60

2 / 8

SWE-Bench Pro - Public

Extra-HighTools

55.60

18 / 44

Math and Reasoning

7 evaluations

Benchmark / mode

Score

Rank/total

Extra-High

100

1 / 106

Extra-HighTools

40.30

8 / 60

FrontierMath - Tier 4

Low

6.30

35 / 80

FrontierMath - Tier 4

Medium

16.70

20 / 80

FrontierMath - Tier 4

High

18.80

16 / 80

FrontierMath - Tier 4

Extra-High

18.80

16 / 80

FrontierMath - Tier 4

Extra-HighTools

14.60

23 / 80

Multimodal Understanding

2 evaluations

Benchmark / mode

Score

Rank/total

Extra-High

85.90

1 / 28

Extra-HighTools

80.40

12 / 28

常识推理

1 evaluations

Benchmark / mode

Score

Rank/total

High

45.80

33 / 63

Agent Level Benchmark

2 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench - Telecom

Extra-HighTools

98.70

4 / 35

Extra-HighTools

82

12 / 40

AI Agent - Information Search

2 evaluations

Benchmark / mode

Score

Rank/total

Extra-HighToolsInternet

65.80

24 / 45

Extra-HighTools

65.80

24 / 45

Productivity Knowledge

2 evaluations

Benchmark / mode

Score

Rank/total

HighTools

70.90

9 / 21

Extra-HighTools

61

10 / 21

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Extra-HighTools

67.60

14 / 23

Compare with other models

Competitor Comparison

Benchmark scores for GPT-5.2 compared against top models in its class

GPT-5.2Gemini 3.0 Pro (Preview 11-2025)Opus 4.5

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	GPT-5.2Current	Gemini 3.0 Pro (Preview 11-2025)	Opus 4.5
ARC-AGI 综合评估	90.50Deep Thinking Mode	87.50Thinking Enabled	--
ARC-AGI-2 综合评估	54.20Deep Thinking Mode	45.10Thinking Enabled	--
GPQA Diamond 综合评估	93.20Deep Thinking Mode	93.80Thinking Enabled	--
HLE 综合评估	45.50Deep Thinking Mode ｜ Tools	45.80Thinking Level · High ｜ Tools	43.20Extended Thinking ｜ Tools
LiveBench 综合评估	48.91Standard Mode	73.39Thinking Level · High	75.9664K
SWE-bench Verified 编程与软件工程	80.00Thinking Level · Extra High ｜ Tools	76.20Thinking Enabled	80.90Extended Thinking ｜ Tools
FrontierMath 数学推理	40.30Thinking Level · Extra High ｜ Tools	38.00Thinking Enabled	--
FrontierMath - Tier 4 数学推理	18.80Thinking Level · Extra High	18.80Thinking Enabled	4.20Standard Mode
τ²-Bench Agent能力评测	82.00Thinking Level · Extra High ｜ Tools	85.40Thinking Enabled ｜ Tools	81.99Extended Thinking ｜ Tools
τ²-Bench - Telecom Agent能力评测	98.70Thinking Level · Extra High ｜ Tools	98.00Thinking Level · High ｜ Tools	90.70Extended Thinking ｜ Tools
BrowseComp AI Agent - 信息收集	65.80Thinking Level · Extra High ｜ Tools	59.20Thinking Level · High ｜ Tools	--
GDPval-AA 生产力知识	70.90Thinking Level · High ｜ Tools	35.00Thinking Level · High	--

Standard API Pricing: GPT-5.2 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.2	Facebook AI研究实验室	$1.75 / 1M tokens	$14 / 1M tokens	—
Opus 4.5	Facebook AI研究实验室	$5 / 1M tokens	$25 / 1M tokens	—

Version History

How each version of the GPT-5.2 series stacks up on benchmark tests

GPT-5.2GPT-5.1 GPT-5

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	GPT-5.2Current	GPT-5.1	GPT-5
ARC-AGI 综合评估	90.50Deep Thinking Mode	72.80Thinking Level · High	65.70Thinking Level · High
ARC-AGI-2 综合评估	54.20Deep Thinking Mode	17.60Thinking Level · High	9.90Thinking Level · High
GPQA Diamond 综合评估	93.20Deep Thinking Mode	88.10Thinking Enabled	87.30Thinking Enabled ｜ Tools
HLE 综合评估	45.50Deep Thinking Mode ｜ Tools	42.70Thinking Level · High ｜ Tools	35.20Thinking Enabled ｜ Tools
LiveBench 综合评估	48.91Standard Mode	69.17Thinking Level · Medium	--
SWE-Bench Pro - Public 编程与软件工程	55.60Thinking Level · Extra High ｜ Tools	--	36.30Thinking Level · High
SWE-bench Verified 编程与软件工程	80.00Thinking Level · Extra High ｜ Tools	76.30Thinking Level · High	72.80Thinking Level · High
FrontierMath 数学推理	40.30Thinking Level · Extra High ｜ Tools	26.70Thinking Level · High ｜ Tools	26.30Thinking Level · High ｜ Tools
FrontierMath - Tier 4 数学推理	18.80Thinking Level · Extra High	12.50Thinking Level · High ｜ Tools	12.50Thinking Level · High
MMMU 多模态理解	80.40Thinking Level · Extra High ｜ Tools	85.40Thinking Level · High	84.20Thinking Level · High
τ²-Bench Agent能力评测	82.00Thinking Level · Extra High ｜ Tools	--	80.00Thinking Enabled ｜ Tools
τ²-Bench - Telecom Agent能力评测	98.70Thinking Level · Extra High ｜ Tools	95.60Thinking Level · High ｜ Tools	96.70Thinking Level · High ｜ Tools

1 additional benchmarks remain in the chart above.

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.2 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.2	Facebook AI研究实验室	$1.75 / 1M tokens	$14 / 1M tokens	—

Sources

arcprize.orgarcprize.org openai.comopenai.com