See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 2 个模型的评测数据与核心参数。

Claude Mythos Preview
Anthropic

Best overall
Claude Mythos Preview · 81.40
Best single
Claude Mythos Preview · GPQA Diamond 94.60
Modality coverage
Claude Mythos Preview · 2 modalities
Head to head
3
Benchmarks
2
Wins
1
Losses
+0.60
Average diff
Compare benchmark results across thinking modes and tool usage.
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Complete scores for each model/mode across selected benchmarks.
3 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | Claude Mythos Preview | GPT-5.4 Pro |
|---|---|---|
GPQA Diamond 综合评估 | 94.60Extended Thinking | 94.40Thinking Level · High |
HLE 综合评估 | 64.70Extended Thinking | Tools | 58.70Thinking Level · High | Tools |
BrowseComp AI Agent - 信息收集 | 84.90Extended Thinking | Tools | 89.30Thinking Level · High | Tools |
Side-by-side input/output token pricing
Licensing, MoE architecture, and multi-modality support.
| Features & specs | Claude Mythos PreviewAnthropic | GPT-5.4 ProOpenAI |
|---|---|---|
Core specsRelease | 2026-04-07 | 2026-03-05 |
Context length | — | 1M |
Max output | 8192 | 128000 |
MoE | No | No |
LicenseCode Open Source | Not provided | Not provided |
Weights Open Source | Not provided | Not provided |
Commercial use | 不开源 | 不开源 |
Modality supportText Input/Output | / | / |
Image Input/Output | / | / |
ResourcesPaper / report | Introducing Claude Mythos Preview and Project Glasswing | Introducing GPT‑5.4 |
DataLearner blog | Claude Mythos 是什么?Anthropic最强模型评测、安全能力与Project Glasswing详解 | Not provided |
GPT-5.4 Pro
OpenAI