Muse SparkvsClaude Opus 4.6

Across 10 shared benchmarks, Claude Opus 4.6 leads overall: Muse Spark wins 2, Claude Opus 4.6 wins 8, with 0 ties and an average score difference of -20.43.

Muse Spark

Facebook AI研究实验室 · 2026-04-08 · Reasoning model

Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Muse Spark2 wins(20%)(80%)8 winsClaude Opus 4.6

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

General Knowledge

Claude Opus 4.6 2/3

Benchmark	Muse Spark	Claude Opus 4.6	Diff
ARC-AGI-2	42.5028 / 62Thinking (No Tools)	66.3017 / 62Extended (no tools)	-23.80
HLE	586 / 172深度思考（无工具、并行）	5318 / 172Extended (with tools, internet)	+5
GPQA Diamond	89.5025 / 187Thinking (No Tools)	91.3115 / 187Extended (no tools)	-1.81

AI Agent - Tool Usage

Even 2/2

Benchmark	Muse Spark	Claude Opus 4.6	Diff
Terminal Bench 2.0	5924 / 47Thinking (With Tools)	65.4011 / 47Extended (with tools)	-6.40
MCP-Atlas	82.205 / 27Normal (With Tools)	76.8010 / 27Deep Thinking (With Tools)	+5.40

Math and Reasoning

Claude Opus 4.6 2/2

Benchmark	Muse Spark	Claude Opus 4.6	Diff
FrontierMath - Tier 4	14.6023 / 80Normal (No Tools)	22.9012 / 80最高（无工具）	-8.30
FrontierMath	399 / 60Thinking (No Tools)	40.707 / 60最高（无工具）	-1.70

Agent Level Benchmark

Claude Opus 4.6 1/1

Benchmark	Muse Spark	Claude Opus 4.6	Diff
τ²-Bench - Telecom	9220 / 35Thinking (With Tools)	99.252 / 35Extended (with tools)	-7.25

Coding and Software Engineer

Claude Opus 4.6 1/1

Benchmark	Muse Spark	Claude Opus 4.6	Diff
SWE-bench Verified	77.4027 / 112Thinking (With Tools)	80.8410 / 112Extended (with tools)	-3.44

Productivity Knowledge

Claude Opus 4.6 1/1

Benchmark	Muse Spark	Claude Opus 4.6	Diff
GDPval-AA	1,4445 / 21Thinking (With Tools)	1,6063 / 21Extended (with tools, internet)	-162

Specs

Field	Muse Spark	Claude Opus 4.6
Publisher	Facebook AI研究实验室	Anthropic
Release date	2026-04-08	2026-02-05
Model type	Reasoning model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	262K	1000K
Max output	Not available	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Muse Spark	Claude Opus 4.6
Text input	Not public	$0.5 / 1M tokens
Text output	Not public	$25 / 1M tokens
Cache read	Not public	$0.5 / 1M tokens
Cache write	Not public	$10 / 1M tokens

One or both models have incomplete public pricing.

Summary

Claude Opus 4.6leads in:General Knowledge (2/3), Math and Reasoning (2/2), Agent Level Benchmark (1/1), Coding and Software Engineer (1/1), Productivity Knowledge (1/1)
Tied in:AI Agent - Tool Usage

On average across the 10 shared benchmarks, Claude Opus 4.6 scores 20.43 higher.

Largest single-benchmark gap: GDPval-AA — Muse Spark 1,444 vs Claude Opus 4.6 1,606 (-162).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Muse Spark details Claude Opus 4.6 details·Customize in compare tool