Qwen3.6-27B Benchmark Details

Qwen3.6-27B currently shows benchmark results led by MMLU Pro (16 / 126, score 86.20), LiveCodeBench (19 / 120, score 83.90), GPQA Diamond (33 / 178, score 87.80). This page also compares it with 3 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Qwen3.6-27B

Benchmark Results

Thinking
Tool usage

General Knowledge

4 evaluations
Benchmark / mode
Score
Rank/total
C-Eval
Thinking Mode
91.40
5 / 9
GPQA Diamond
Thinking Mode
87.80
33 / 178
MMLU Pro
Thinking Mode
86.20
16 / 126
HLE
Thinking Mode
24
92 / 157

Coding and Software Engineer

4 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
Thinking Mode
83.90
19 / 120
SWE-bench Verified
Thinking ModeTools
77.20
25 / 108
SWE-bench Multilingual
Thinking ModeTools
71.30
13 / 20
SWE-Bench Pro - Public
Thinking ModeTools
53.50
24 / 43

AI Agent - Tool Usage

1 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
Thinking ModeTools
59.30
20 / 46

Math and Reasoning

2 evaluations
Benchmark / mode
Score
Rank/total
AIME 2026
Thinking Mode
94.10
4 / 14
IMO-AnswerBench
Thinking Mode
80.80
16 / 19

Claw-style Agent Evaluation

1 evaluations
Benchmark / mode
Score
Rank/total
Claw Bench
Thinking ModeTools
72.40
27 / 29

Competitor Comparison

Benchmark scores for Qwen3.6-27B compared against top models in its class

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

BenchmarkQwen3.6-27BCurrentGemini 3.0 FlashHaiku 4.5GPT-5.4 mini
GPQA Diamond
综合评估
87.80Thinking Enabled
90.40Thinking Enabled
73.30Extended Thinking
88.00Thinking Level · Extra High
HLE
综合评估
24.00Thinking Enabled
43.50Thinking Enabled | Tools
9.70Extended Thinking
41.50Thinking Level · Extra High | Tools
MMLU Pro
综合评估
86.20Thinking Enabled
--
80.00Extended Thinking
--
LiveCodeBench
编程与软件工程
83.90Thinking Enabled
--
62.00Extended Thinking
--
SWE-Bench Pro - Public
编程与软件工程
53.50Thinking Enabled | Tools
49.60Thinking Level · High | Tools
39.45Extended Thinking | Tools
54.40Thinking Level · Extra High | Tools
SWE-bench Verified
编程与软件工程
77.20Thinking Enabled | Tools
68.70Thinking Enabled
73.30128K | Tools
--
Terminal Bench 2.0
AI Agent - 工具使用
59.30Thinking Enabled | Tools
47.60Thinking Enabled | Tools
--
60.00Thinking Level · Extra High | Tools
Claw Bench
OpenClaw智能体能力综合测评
72.40Thinking Enabled | Tools
85.70Thinking Enabled | Tools
89.40Thinking Enabled | Tools
75.30Thinking Enabled | Tools

Standard API Pricing: Qwen3.6-27B vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.4 mini
OpenAI$0.75 / 1M tokens$4.5 / 1M tokens

Version History

How each version of the Qwen3.6-27B series stacks up on benchmark tests

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkQwen3.6-27BCurrentQwen3.5-27BQwen3-32BQwen2.5-32B
C-Eval
综合评估
91.40Thinking Enabled
90.50Thinking Enabled
87.30Thinking Enabled
--
GPQA Diamond
综合评估
87.80Thinking Enabled
85.50Thinking Enabled
68.40Thinking Enabled
--
HLE
综合评估
24.00Thinking Enabled
48.50Thinking Enabled | Tools
--
--
MMLU Pro
综合评估
86.20Thinking Enabled
86.10Thinking Enabled
--
69.23Standard Mode
LiveCodeBench
编程与软件工程
83.90Thinking Enabled
80.70Thinking Enabled | Tools
65.70Thinking Enabled
51.20Standard Mode
SWE-bench Verified
编程与软件工程
77.20Thinking Enabled | Tools
72.40Thinking Enabled
--
--
Terminal Bench 2.0
AI Agent - 工具使用
59.30Thinking Enabled | Tools
41.60Thinking Enabled | Tools
--
--
Claw Bench
OpenClaw智能体能力综合测评
72.40Thinking Enabled | Tools
75.20Thinking Enabled | Tools
--
--

Single-Benchmark Version Trend

Viewing: C-Eval · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Qwen3.6-27B Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

Sources