GPT-4 Benchmark Details

GPT-4 currently shows benchmark results led by MMLU (30 / 63, score 86.40), HumanEval (27 / 38, score 67), DROP (7 / 7, score 80.90). This page also compares it with 1 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

GPT-4

Benchmark Results

综合评估

1 evaluations

Benchmark / mode

Score

Rank/total

MMLU

常规模式

86.40

30 / 63

编程与软件工程

1 evaluations

Benchmark / mode

Score

Rank/total

HumanEval

常规模式

27 / 38

阅读理解

1 evaluations

Benchmark / mode

Score

Rank/total

DROP

常规模式

80.90

7 / 7

Compare with other models

Competitor Comparison

Side-by-side benchmark comparison of GPT-4 against leading peer models

GPT-4(Current model)Claude3-Opus

Benchmark categories:

Benchmark Comparison Chart

Vertical view

GPT-4:

常规模式

Claude3-Opus:

normal

Mode icons in chart labels:Thinking modeTool usage

Benchmark Score Comparison

Top 3 benchmarks with comparable scores

Benchmark	GPT-4(Current)	Claude3-Opus
MMLU 综合评估	86.40 常规模式（无工具）	86.80 normal
HumanEval 编程与软件工程	67.00 常规模式（无工具）	84.90 normal
DROP 阅读理解	80.90 常规模式（无工具）	83.10 normal

Standard API Pricing: GPT-4 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

Generational Comparison

Track the evolution of the GPT-4 series across generations

GPT-4(Current model)GPT-3.5 GPT-3

No benchmark data matches the selected filters.

Standard API Pricing Across the GPT-4 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

References

openai.com