DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
HomeAI ModelsGPT-5.1 vs Gemini 2.5-Pro

GPT-5.1vsGemini 2.5-Pro

Across 14 shared benchmarks, GPT-5.1 leads overall: GPT-5.1 wins 13, Gemini 2.5-Pro wins 1, with 0 ties and an average score difference of +16.02.

OpenAI
GPT-5.1

OpenAI · 2025-11-12 · Reasoning model

Google Deep Mind
Gemini 2.5-Pro

Google Deep Mind · 2025-06-05 · Reasoning model

GPT-5.113 wins(93%)(7%)1 winGemini 2.5-Pro

Benchmark scores

Grouped by capability, sorted by largest gap within each. 14 shared benchmarks.

General Knowledge

GPT-5.1 4/4
BenchmarkGPT-5.1Gemini 2.5-ProDiff
ARC-AGI72.8025 / 65high3747 / 65thinking+35.80
HLE42.7038 / 149Thinking High (With Tools + Internet)21.6089 / 149thinking+21.10
ARC-AGI-217.6032 / 58high4.9043 / 58thinking+12.70
GPQA Diamond88.1025 / 175Thinking High (No Tools)86.4038 / 175thinking+1.70

Math and Reasoning

GPT-5.1 3/4
BenchmarkGPT-5.1Gemini 2.5-ProDiff
FrontierMath26.7013 / 60Thinking High (With Tools)1123 / 60+15.70
FrontierMath - Tier 412.5029 / 80Thinking High (With Tools)2.1056 / 80Normal (No Tools)+10.40
Simple Bench53.2010 / 27high62.402 / 27thinking-9.20
AIME20259428 / 106Thinking High (No Tools)8843 / 106thinking+6

Agent Level Benchmark

GPT-5.1 2/2
BenchmarkGPT-5.1Gemini 2.5-ProDiff
τ²-Bench - Telecom95.6014 / 35Thinking High (With Tools)5432 / 35thinking + 使用工具+41.60
Terminal Bench Hard432 / 13Thinking High (With Tools)2512 / 13thinking + 使用工具+18

AI Agent - Information Search

GPT-5.1 1/1
BenchmarkGPT-5.1Gemini 2.5-ProDiff
BrowseComp50.8034 / 43Thinking High (No Tools)7.8042 / 43thinking + 使用工具+43

AI Agent - Tool Usage

GPT-5.1 1/1
BenchmarkGPT-5.1Gemini 2.5-ProDiff
Terminal Bench 2.047.6034 / 43Thinking High (With Tools)32.6043 / 43thinking + 使用工具+15

Coding and Software Engineer

GPT-5.1 1/1
BenchmarkGPT-5.1Gemini 2.5-ProDiff
SWE-bench Verified76.3025 / 103high67.2063 / 103thinking+9.10

Multimodal Understanding

GPT-5.1 1/1
BenchmarkGPT-5.1Gemini 2.5-ProDiff
MMMU85.402 / 28Thinking High (No Tools)829 / 28thinking+3.40

Specs

FieldGPT-5.1Gemini 2.5-Pro
PublisherOpenAIGoogle Deep Mind
Release date2025-11-122025-06-05
Model typeReasoning modelReasoning model
ArchitectureDenseDense
Parameters0.00.0
Context length400K1000K
Max output13107265536

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGPT-5.1Gemini 2.5-Pro
Text input1.25 美元/100万 tokens1.25 美元/100 万tokens
Text output10 美元/100万 tokens10 美元/100 万tokens
Cache read0.125 美元/100万 tokens0.125 美元/100 万tokens

Summary

  • GPT-5.1leads in:General Knowledge (4/4), Math and Reasoning (3/4), Agent Level Benchmark (2/2), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Multimodal Understanding (1/1)

On average across the 14 shared benchmarks, GPT-5.1 scores 16.02 higher.

Largest single-benchmark gap: BrowseComp — GPT-5.1 50.80 vs Gemini 2.5-Pro 7.80 (+43).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

GPT-5.1 detailsGemini 2.5-Pro details·Customize in compare tool