GPT-4o(2024-11-20) Benchmark Details
GPT-4o(2024-11-20) currently shows benchmark results led by HumanEval (7 / 39, score 90.20), SimpleQA (19 / 45, score 38.80), MMLU Pro (62 / 116, score 77.90). This page also compares it with 3 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.
Benchmark Results
Benchmark Results
Competitor Comparison
Benchmark scores for GPT-4o(2024-11-20) compared against top models in its class
Benchmark Score Comparison
6 benchmarks with comparable scores
| Benchmark | GPT-4o(2024-11-20)(This model) | Claude3-Opus | Gemini 2.0 Pro Experimental | DeepSeek-V3 |
|---|---|---|---|---|
MMLU 综合评估 | 85.70 normal | 86.80 normal | 86.50 normal | 88.50 normal |
MMLU Pro 综合评估 | 77.90 normal | 68.45 normal | 79.10 normal | 75.90 normal |
HumanEval 编程与软件工程 | 90.20 normal | 84.90 normal | -- | 89.00 normal |
FrontierMath 数学推理 | 0.30 normal | -- | -- | 1.70 normal |
MATH 数学推理 | 68.50 normal | 60.10 normal | 91.80 normal | 87.80 normal |
SimpleQA 常识问答 | 38.80 normal | -- | 44.30 normal | 24.90 normal |
Standard API Pricing: GPT-4o(2024-11-20) vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
Version History
How each version of the GPT-4o(2024-11-20) series stacks up on benchmark tests
Benchmark Score Comparison
7 benchmarks with comparable scores
| Benchmark | GPT-4o(2024-11-20)(This model) | GPT-4o | GPT-4 |
|---|---|---|---|
MMLU 综合评估 | 85.70 normal | 88.70 normal | 86.40 常规模式(无工具) |
MMLU Pro 综合评估 | 77.90 normal | 77.90 normal | -- |
HumanEval 编程与软件工程 | 90.20 normal | 90.00 normal | 67.00 常规模式(无工具) |
SWE-bench Verified 编程与软件工程 | 31.00 常规模式(无工具) | 31.00 normal | -- |
FrontierMath 数学推理 | 0.30 normal | 0.30 normal | -- |
MATH 数学推理 | 68.50 normal | 75.90 normal | -- |
SimpleQA 常识问答 | 38.80 normal | 38.20 normal | -- |
Standard API Pricing Across the GPT-4o(2024-11-20) Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-4o | — | 2.5 美元/100万 tokens | 10 美元/100万 tokens | — |
Series Overview
See how each version of the GPT-4o(2024-11-20) series performs across major benchmarks. Click any row to break down scores by reasoning mode.
Tip: click any score cell to switch the chart below.
| Benchmark | GPT-43/14/2023 | GPT-4o5/13/2024 | GPT-4o(2024-11-20)11/20/2024 |
|---|---|---|---|
Single-Benchmark Mode Relation
Viewing: MMLU · 综合评估