Composer 2 Benchmark Details

Composer 2 currently shows benchmark results led by Terminal Bench 2.0 (15 / 46, score 61.70), SWE-bench Multilingual (9 / 20, score 73.70). This page also compares it with 3 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Composer 2

Benchmark Results

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal Bench 2.0

Thinking Mode

61.70

15 / 46

Coding and Software Engineer

1 evaluations

Benchmark / mode

Score

Rank/total

SWE-bench Multilingual

Thinking Mode

73.70

9 / 20

Compare with other models

Competitor Comparison

Benchmark scores for Composer 2 compared against top models in its class

Composer 2GPT-5.4 Claude Opus 4.6 Kimi K2.5

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

2 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Composer 2Current	GPT-5.4	Claude Opus 4.6	Kimi K2.5
Terminal Bench 2.0 AI Agent - 工具使用	61.70Thinking Enabled	75.10Thinking Level · Extra High ｜ Tools	65.40Extended Thinking ｜ Tools	50.80Thinking Enabled ｜ Tools
SWE-bench Multilingual 编程与软件工程	73.70Thinking Enabled	--	72.00Extended Thinking ｜ Tools	--

Standard API Pricing: Composer 2 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4: Base price applies to <= 272K

Claude Opus 4.6: Base price applies to <= 200K

Model	Supplier	Standard input	Standard output	Base price applies to
Composer 2	Cursor	$0.5 / 1M tokens	$2.5 / 1M tokens	—
GPT-5.4	OpenAI	$2.5 / 1M tokens	$15 / 1M tokens	<= 272K
Claude Opus 4.6	Anthropic	$5 / 1M tokens	$25 / 1M tokens	<= 200K

Version History

How each version of the Composer 2 series stacks up on benchmark tests

Composer 2Composer 1.5 Composer 1

Benchmark categories:

2 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Composer 2Current	Composer 1.5	Composer 1
Terminal Bench 2.0 AI Agent - 工具使用	61.70Thinking Enabled	47.90Thinking Enabled	40.00Thinking Enabled
SWE-bench Multilingual 编程与软件工程	73.70Thinking Enabled	65.90Thinking Enabled	56.90Thinking Enabled

Single-Benchmark Version Trend

Viewing: Terminal Bench 2.0 · AI Agent - 工具使用

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Composer 2 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Composer 2	Cursor	$0.5 / 1M tokens	$2.5 / 1M tokens	—
Composer 1.5	Cursor	$3.5 / 1M tokens	$17.5 / 1M tokens	—
Composer 1	Cursor	$1.25 / 1M tokens	$10 / 1M tokens	—

Sources

cursor.comcursor.com