Muse SparkvsGPT-5.4

Across 7 shared benchmarks, GPT-5.4 leads overall: Muse Spark wins 2, GPT-5.4 wins 5, with 0 ties and an average score difference of -5.93.

Facebook AI研究实验室
Muse Spark

Facebook AI研究实验室 · 2026-04-08 · Reasoning model

OpenAI
GPT-5.4

OpenAI · 2026-03-05 · Multimodal model

Muse Spark2 wins(29%)(71%)5 winsGPT-5.4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.

General Knowledge

GPT-5.4 2/3
BenchmarkMuse SparkGPT-5.4Diff
ARC-AGI-242.5025 / 59Thinking (No Tools)77.107 / 59Normal (No Tools)-34.60
HLE584 / 157深度思考(无工具、并行)52.1014 / 157极高强度思考(工具)+5.90
GPQA Diamond89.5022 / 178Thinking (No Tools)92.8010 / 178极高强度思考(无工具)-3.30

Math and Reasoning

GPT-5.4 2/2
BenchmarkMuse SparkGPT-5.4Diff
FrontierMath - Tier 414.6023 / 80Normal (No Tools)27.1011 / 80极高强度思考(无工具)-12.50
FrontierMath399 / 60Thinking (No Tools)47.605 / 60极高强度思考(无工具)-8.60

Agent Level Benchmark

Muse Spark 1/1
BenchmarkMuse SparkGPT-5.4Diff
τ²-Bench - Telecom9220 / 35Thinking (With Tools)64.3030 / 35Normal (With Tools)+27.70

AI Agent - Tool Usage

GPT-5.4 1/1
BenchmarkMuse SparkGPT-5.4Diff
Terminal Bench 2.05924 / 46Thinking (With Tools)75.104 / 46极高强度思考(工具)-16.10

Specs

FieldMuse SparkGPT-5.4
PublisherFacebook AI研究实验室OpenAI
Release date2026-04-082026-03-05
Model typeReasoning modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length262K1M
Max outputNot available125K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemMuse SparkGPT-5.4
Text inputNot public$2.5 / 1M tokens
Text outputNot public$15 / 1M tokens
Cache writeNot public$0.25 / 1M tokens

One or both models have incomplete public pricing.

Summary

  • Muse Sparkleads in:Agent Level Benchmark (1/1)
  • GPT-5.4leads in:General Knowledge (2/3), Math and Reasoning (2/2), AI Agent - Tool Usage (1/1)

On average across the 7 shared benchmarks, GPT-5.4 scores 5.93 higher.

Largest single-benchmark gap: ARC-AGI-2 — Muse Spark 42.50 vs GPT-5.4 77.10 (-34.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.