Grok 4.1 Benchmark Details
Grok 4.1 currently shows benchmark results led by SWE-bench Verified (70 / 94, score 54.60). This page also tracks comparisons against 3 predecessor or same-series models. 1 source link is attached for reference.
Benchmark Results
Benchmark Results
Version History
How each version of the Grok 4.1 series stacks up on benchmark tests
Benchmark Score Comparison
1 benchmarks with comparable scores
| Benchmark | Grok 4.1(This model) | GPT-4o(2024-11-20) | GPT-4o |
|---|---|---|---|
SWE-bench Verified 编程与软件工程 | 54.60 常规模式(无工具) | 31.00 常规模式(无工具) | 31.00 normal |
Standard API Pricing Across the Grok 4.1 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-4o | — | 2.5 美元/100万 tokens | 10 美元/100万 tokens | — |
Series Overview
See how each version of the Grok 4.1 series performs across major benchmarks. Click any row to break down scores by reasoning mode.
Tip: click any score cell to switch the chart below.
| Benchmark | GPT-4o5/13/2024 | GPT-4o(2024-11-20)11/20/2024 | Grok 4.111/17/2025 |
|---|---|---|---|
Single-Benchmark Mode Relation
Viewing: SWE-bench Verified · 编程与软件工程