Benchmark Results
Benchmark Results
编程与软件工程
2 evaluationsCompetitor Comparison
Benchmark scores for GPT-4o(2024-11-20) compared against top models in its class
6 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | GPT-4o(2024-11-20)Current | Claude3-Opus | Gemini 2.0 Pro Experimental | DeepSeek-V3 |
|---|---|---|---|---|
MMLU 综合评估 | 85.70Standard Mode | 86.80Standard Mode | 86.50Standard Mode | 88.50Standard Mode |
MMLU Pro 综合评估 | 77.90Standard Mode | 68.45Standard Mode | 79.10Standard Mode | 75.90Standard Mode |
HumanEval 编程与软件工程 | 90.20Standard Mode | 84.90Standard Mode | -- | 89.00Standard Mode |
FrontierMath 数学推理 | 0.30Standard Mode | -- | -- | 1.70Standard Mode |
MATH 数学推理 | 68.50Standard Mode | 60.10Standard Mode | 91.80Standard Mode | 87.80Standard Mode |
SimpleQA 常识问答 | 38.80Standard Mode | -- | 44.30Standard Mode | 24.90Standard Mode |
Standard API Pricing: GPT-4o(2024-11-20) vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
Version History
How each version of the GPT-4o(2024-11-20) series stacks up on benchmark tests
7 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | GPT-4o(2024-11-20)Current | GPT-4o | GPT-4 |
|---|---|---|---|
MMLU 综合评估 | 85.70Standard Mode | 88.70Standard Mode | 86.40Standard Mode |
MMLU Pro 综合评估 | 77.90Standard Mode | 77.90Standard Mode | -- |
HumanEval 编程与软件工程 | 90.20Standard Mode | 90.00Standard Mode | 67.00Standard Mode |
SWE-bench Verified 编程与软件工程 | 31.00Standard Mode | 31.00Standard Mode | -- |
FrontierMath 数学推理 | 0.30Standard Mode | 0.30Standard Mode | -- |
MATH 数学推理 | 68.50Standard Mode | 75.90Standard Mode | -- |
SimpleQA 常识问答 | 38.80Standard Mode | 38.20Standard Mode | -- |
Single-Benchmark Version Trend
Viewing: MMLU · 综合评估
Standard API Pricing Across the GPT-4o(2024-11-20) Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-4o | — | 2.5 美元/100万 tokens | 10 美元/100万 tokens | — |