Muse SparkvsGemini 3.1 Pro Preview

Across 8 shared benchmarks, Gemini 3.1 Pro Preview leads overall: Muse Spark wins 2, Gemini 3.1 Pro Preview wins 6, with 0 ties and an average score difference of -6.60.

Facebook AI研究实验室
Muse Spark

Facebook AI研究实验室 · 2026-04-08 · Reasoning model

Google Deep Mind
Gemini 3.1 Pro Preview

Google Deep Mind · 2026-02-20 · Multimodal model

Muse Spark2 wins(25%)(75%)6 winsGemini 3.1 Pro Preview

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

General Knowledge

Gemini 3.1 Pro Preview 2/3
BenchmarkMuse SparkGemini 3.1 Pro PreviewDiff
ARC-AGI-242.5025 / 59Thinking (No Tools)77.107 / 59Thinking High (No Tools)-34.60
HLE584 / 157深度思考(无工具、并行)51.4015 / 157Thinking High (With Tools)+6.60
GPQA Diamond89.5022 / 178Thinking (No Tools)94.303 / 178Thinking High (No Tools)-4.80

Math and Reasoning

Even 2/2
BenchmarkMuse SparkGemini 3.1 Pro PreviewDiff
FrontierMath - Tier 414.6023 / 80Normal (No Tools)16.7020 / 80Normal (No Tools)-2.10
FrontierMath399 / 60Thinking (No Tools)36.9011 / 60Thinking High (No Tools)+2.10

Agent Level Benchmark

Gemini 3.1 Pro Preview 1/1
BenchmarkMuse SparkGemini 3.1 Pro PreviewDiff
τ²-Bench - Telecom9220 / 35Thinking (With Tools)99.301 / 35Thinking High (With Tools)-7.30

AI Agent - Tool Usage

Gemini 3.1 Pro Preview 1/1
BenchmarkMuse SparkGemini 3.1 Pro PreviewDiff
Terminal Bench 2.05924 / 46Thinking (With Tools)68.508 / 46Thinking High (With Tools)-9.50

Coding and Software Engineer

Gemini 3.1 Pro Preview 1/1
BenchmarkMuse SparkGemini 3.1 Pro PreviewDiff
SWE-bench Verified77.4024 / 108Thinking (With Tools)80.6010 / 108Thinking High (With Tools)-3.20

Specs

FieldMuse SparkGemini 3.1 Pro Preview
PublisherFacebook AI研究实验室Google Deep Mind
Release date2026-04-082026-02-20
Model typeReasoning modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length262K1M
Max outputNot available32K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemMuse SparkGemini 3.1 Pro Preview
Text inputNot public$2 / 1M tokens
Text outputNot public$12 / 1M tokens

One or both models have incomplete public pricing.

Summary

  • Gemini 3.1 Pro Previewleads in:General Knowledge (2/3), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1)
  • Tied in:Math and Reasoning

On average across the 8 shared benchmarks, Gemini 3.1 Pro Preview scores 6.60 higher.

Largest single-benchmark gap: ARC-AGI-2 — Muse Spark 42.50 vs Gemini 3.1 Pro Preview 77.10 (-34.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.