Qwen3.6-27B Benchmark Details

Qwen3.6-27B currently shows benchmark results led by MMLU Pro (17 / 132, score 86.20), LiveCodeBench (19 / 123, score 83.90), GPQA Diamond (36 / 187, score 87.80). This page also compares it with 3 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Qwen3.6-27B

Benchmark Results

General Knowledge

5 evaluations

Benchmark / mode

Score

Rank/total

C-Eval

Thinking Mode

91.40

5 / 10

GPQA Diamond

Thinking Mode

87.80

36 / 187

MMLU Pro

Thinking Mode

86.20

17 / 132

LiveBench

Standard Mode

65.56

52 / 115

HLE

Thinking Mode

107 / 172

Coding and Software Engineer

4 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Thinking Mode

83.90

19 / 123

SWE-bench Verified

Thinking ModeTools

77.20

28 / 112

SWE-bench Multilingual

Thinking ModeTools

71.30

16 / 23

SWE-Bench Pro - Public

Thinking ModeTools

53.50

34 / 54

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal Bench 2.0

Thinking ModeTools

59.30

20 / 47

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

AIME 2026

Thinking Mode

94.10

6 / 18

IMO-AnswerBench

Thinking Mode

80.80

18 / 21

Claw-style Agent Evaluation

1 evaluations

Benchmark / mode

Score

Rank/total

Claw Bench

Thinking ModeTools

72.40

27 / 29

Compare with other models

Competitor Comparison

Benchmark scores for Qwen3.6-27B compared against top models in its class

Qwen3.6-27BGemini 3.0 Flash Haiku 4.5 GPT-5.4 mini

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

9 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Qwen3.6-27BCurrent	Gemini 3.0 Flash	Haiku 4.5	GPT-5.4 mini
GPQA Diamond 综合评估	87.80Thinking Enabled	90.40Thinking Enabled	73.30Extended Thinking	88.00Thinking Level · Extra High
HLE 综合评估	24.00Thinking Enabled	43.50Thinking Enabled ｜ Tools	9.70Extended Thinking	41.50Thinking Level · Extra High ｜ Tools
LiveBench 综合评估	65.56Standard Mode	72.40Thinking Level · High	61.3264K	67.54Deep Thinking Mode
MMLU Pro 综合评估	86.20Thinking Enabled	--	80.00Extended Thinking	--
LiveCodeBench 编程与软件工程	83.90Thinking Enabled	--	62.00Extended Thinking	--
SWE-Bench Pro - Public 编程与软件工程	53.50Thinking Enabled ｜ Tools	49.60Thinking Level · High ｜ Tools	39.45Extended Thinking ｜ Tools	54.40Thinking Level · Extra High ｜ Tools
SWE-bench Verified 编程与软件工程	77.20Thinking Enabled ｜ Tools	68.70Thinking Enabled	73.30128K ｜ Tools	--
Terminal Bench 2.0 AI Agent - 工具使用	59.30Thinking Enabled ｜ Tools	47.60Thinking Enabled ｜ Tools	--	60.00Thinking Level · Extra High ｜ Tools
Claw Bench OpenClaw智能体能力综合测评	72.40Thinking Enabled ｜ Tools	85.70Thinking Enabled ｜ Tools	89.40Thinking Enabled ｜ Tools	75.30Thinking Enabled ｜ Tools

Standard API Pricing: Qwen3.6-27B vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Gemini 3.0 Flash	Google Deep Mind	$0.5 / 1M tokens	$3 / 1M tokens	—
Haiku 4.5	Anthropic	$1 / 1M tokens	$5 / 1M tokens	—
GPT-5.4 mini	OpenAI	$0.75 / 1M tokens	$4.5 / 1M tokens	—

Version History

How each version of the Qwen3.6-27B series stacks up on benchmark tests

Qwen3.6-27BQwen3.5-27B Qwen3-32B Qwen2.5-32B

Benchmark categories:

9 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Qwen3.6-27BCurrent	Qwen3.5-27B	Qwen3-32B	Qwen2.5-32B
C-Eval 综合评估	91.40Thinking Enabled	90.50Thinking Enabled	87.30Thinking Enabled	--
GPQA Diamond 综合评估	87.80Thinking Enabled	85.50Thinking Enabled	68.40Thinking Enabled	--
HLE 综合评估	24.00Thinking Enabled	48.50Thinking Enabled ｜ Tools	--	--
LiveBench 综合评估	65.56Standard Mode	--	43.56Thinking Enabled	--
MMLU Pro 综合评估	86.20Thinking Enabled	86.10Thinking Enabled	--	69.23Standard Mode
LiveCodeBench 编程与软件工程	83.90Thinking Enabled	80.70Thinking Enabled ｜ Tools	65.70Thinking Enabled	51.20Standard Mode
SWE-bench Verified 编程与软件工程	77.20Thinking Enabled ｜ Tools	72.40Thinking Enabled	--	--
Terminal Bench 2.0 AI Agent - 工具使用	59.30Thinking Enabled ｜ Tools	41.60Thinking Enabled ｜ Tools	--	--
Claw Bench OpenClaw智能体能力综合测评	72.40Thinking Enabled ｜ Tools	75.20Thinking Enabled ｜ Tools	--	--

Single-Benchmark Version Trend

Viewing: C-Eval · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Qwen3.6-27B Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.

Qwen3-32B

Supplier: 阿里巴巴

Standard input: ¥0.0012 / 1K tokens

Standard output: ¥0.0048 / 1K tokens

Qwen2.5-32B

Supplier: 阿里巴巴

Standard input: ¥0.002 / 1K tokens

Standard output: ¥0.006 / 1K tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Qwen3-32B	阿里巴巴	¥0.0012 / 1K tokens	¥0.0048 / 1K tokens	—
Qwen2.5-32B	阿里巴巴	¥0.002 / 1K tokens	¥0.006 / 1K tokens	—

Sources

qwen.aiqwen.ai