Claude Sonnet 4.6vsGPT-5.2

Across 8 shared benchmarks, GPT-5.2 leads overall: Claude Sonnet 4.6 wins 3, GPT-5.2 wins 5, with 0 ties and an average score difference of -1.55.

Anthropic
Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

OpenAI
GPT-5.2

OpenAI · 2025-12-11 · Chat model

Claude Sonnet 4.63 wins(38%)(63%)5 winsGPT-5.2

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

General Knowledge

Claude Sonnet 4.6 2/3
BenchmarkClaude Sonnet 4.6GPT-5.2Diff
ARC-AGI-258.3018 / 5954.2020 / 59深度思考(无工具、并行)+4.10
HLE4925 / 15745.5032 / 157Deep Thinking (With Tools + Internet)+3.50
GPQA Diamond89.9021 / 17893.208 / 178深度思考(无工具、并行)-3.30

Agent Level Benchmark

GPT-5.2 1/1
BenchmarkClaude Sonnet 4.6GPT-5.2Diff
τ²-Bench - Telecom97.909 / 3598.704 / 35极高强度思考(工具)-0.80

AI Agent - Information Search

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6GPT-5.2Diff
BrowseComp74.7020 / 4565.8024 / 45Deep Thinking (With Tools + Internet)+8.90

Coding and Software Engineer

GPT-5.2 1/1
BenchmarkClaude Sonnet 4.6GPT-5.2Diff
SWE-bench Verified79.6017 / 1088016 / 108极高强度思考(工具)-0.40

Math and Reasoning

GPT-5.2 1/1
BenchmarkClaude Sonnet 4.6GPT-5.2Diff
FrontierMath - Tier 48.3034 / 80Thinking (No Tools, 16K Budget)18.8016 / 80Thinking High (No Tools)-10.50

Productivity Knowledge

GPT-5.2 1/1
BenchmarkClaude Sonnet 4.6GPT-5.2Diff
GDPval-AA5711 / 2170.909 / 21Thinking High (With Tools)-13.90

Specs

FieldClaude Sonnet 4.6GPT-5.2
PublisherAnthropicOpenAI
Release date2026-02-172025-12-11
Model typeChat modelChat model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M400K
Max output8KNot available

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Sonnet 4.6GPT-5.2
Text input$3 / 1M tokens$1.75 / 1M tokens
Text output$15 / 1M tokens$14 / 1M tokens
Cache read$0.3 / 1M tokens$0.175 / 1M tokens
Cache write$3.75 / 1M tokens$1.75 / 1M tokens

Summary

  • Claude Sonnet 4.6leads in:General Knowledge (2/3), AI Agent - Information Search (1/1)
  • GPT-5.2leads in:Agent Level Benchmark (1/1), Coding and Software Engineer (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)

On average across the 8 shared benchmarks, GPT-5.2 scores 1.55 higher.

Largest single-benchmark gap: GDPval-AA — Claude Sonnet 4.6 57 vs GPT-5.2 70.90 (-13.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.