Muse SparkvsClaude Opus 4.6

Across 9 shared benchmarks, Claude Opus 4.6 leads overall: Muse Spark wins 1, Claude Opus 4.6 wins 8, with 0 ties and an average score difference of -23.30.

Facebook AI研究实验室
Muse Spark

Facebook AI研究实验室 · 2026-04-08 · Reasoning model

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Muse Spark1 win(11%)(89%)8 winsClaude Opus 4.6

Benchmark scores

Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.

General Knowledge

Claude Opus 4.6 2/3
BenchmarkMuse SparkClaude Opus 4.6Diff
ARC-AGI-242.5025 / 59Thinking (No Tools)66.3015 / 59Extended (no tools)-23.80
HLE584 / 157深度思考(无工具、并行)5311 / 157Extended (with tools, internet)+5
GPQA Diamond89.5022 / 178Thinking (No Tools)91.3114 / 178Extended (no tools)-1.81

Math and Reasoning

Claude Opus 4.6 2/2
BenchmarkMuse SparkClaude Opus 4.6Diff
FrontierMath - Tier 414.6023 / 80Normal (No Tools)22.9012 / 80最高(无工具)-8.30
FrontierMath399 / 60Thinking (No Tools)40.707 / 60最高(无工具)-1.70

Agent Level Benchmark

Claude Opus 4.6 1/1
BenchmarkMuse SparkClaude Opus 4.6Diff
τ²-Bench - Telecom9220 / 35Thinking (With Tools)99.252 / 35Extended (with tools)-7.25

AI Agent - Tool Usage

Claude Opus 4.6 1/1
BenchmarkMuse SparkClaude Opus 4.6Diff
Terminal Bench 2.05924 / 46Thinking (With Tools)65.4011 / 46Extended (with tools)-6.40

Coding and Software Engineer

Claude Opus 4.6 1/1
BenchmarkMuse SparkClaude Opus 4.6Diff
SWE-bench Verified77.4024 / 108Thinking (With Tools)80.849 / 108Extended (with tools)-3.44

Productivity Knowledge

Claude Opus 4.6 1/1
BenchmarkMuse SparkClaude Opus 4.6Diff
GDPval-AA1,4445 / 21Thinking (With Tools)1,6063 / 21Extended (with tools, internet)-162

Specs

FieldMuse SparkClaude Opus 4.6
PublisherFacebook AI研究实验室Anthropic
Release date2026-04-082026-02-05
Model typeReasoning modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length262K1000K
Max outputNot available64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemMuse SparkClaude Opus 4.6
Text inputNot public$0.5 / 1M tokens
Text outputNot public$25 / 1M tokens
Cache readNot public$0.5 / 1M tokens
Cache writeNot public$10 / 1M tokens

One or both models have incomplete public pricing.

Summary

  • Claude Opus 4.6leads in:General Knowledge (2/3), Math and Reasoning (2/2), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Productivity Knowledge (1/1)

On average across the 9 shared benchmarks, Claude Opus 4.6 scores 23.30 higher.

Largest single-benchmark gap: GDPval-AA — Muse Spark 1,444 vs Claude Opus 4.6 1,606 (-162).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.