Qwen3.5-27B Benchmark Details

Qwen3.5-27B currently shows benchmark results led by Pinch Bench (2 / 37, score 90), IF Bench (4 / 30, score 76.50), MMLU Pro (18 / 126, score 86.10). This page also compares it with 1 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Qwen3.5-27B

Benchmark Results

General Knowledge

5 evaluations

Benchmark / mode

Score

Rank/total

C-Eval

Thinking Enabled

90.50

6 / 9

MMLU Pro

Thinking Enabled

86.10

18 / 126

GPQA Diamond

Thinking Enabled

85.50

51 / 182

HLE

Thinking Enabled

24.30

101 / 168

HLE

Thinking EnabledTools

48.50

31 / 168

Coding and Software Engineer

3 evaluations

Benchmark / mode

Score

Rank/total

CodeForces

Thinking Enabled

1899

15 / 16

LiveCodeBench

Thinking EnabledTools

80.70

27 / 120

SWE-bench Verified

Thinking Enabled

72.40

51 / 110

Multimodal Understanding

2 evaluations

Benchmark / mode

Score

Rank/total

MMMU

Thinking Enabled

82.30

8 / 28

SimpleVQA

Thinking Enabled

2 / 2

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench

Thinking EnabledTools

17 / 40

Instruction Following

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

Thinking Enabled

76.50

4 / 30

AI Agent - Information Search

2 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

Thinking EnabledToolsInternet

32 / 51

BrowseComp

Thinking EnabledTools

32 / 51

AI Agent - Tool Usage

2 evaluations

Benchmark / mode

Score

Rank/total

OSWorld-Verified

Thinking EnabledTools

56.20

17 / 20

Terminal Bench 2.0

Thinking EnabledTools

41.60

42 / 46

Long Context

2 evaluations

Benchmark / mode

Score

Rank/total

AA-LCR

Thinking Enabled

66.10

8 / 14

LongBench v2

Standard Mode

60.60

7 / 11

Claw-style Agent Evaluation

2 evaluations

Benchmark / mode

Score

Rank/total

Pinch Bench

Thinking EnabledTools

2 / 37

Claw Bench

Thinking EnabledTools

75.20

26 / 29

Compare with other models

Competitor Comparison

Benchmark scores for Qwen3.5-27B compared against top models in its class

Qwen3.5-27BGemma 4 31B

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

5 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Qwen3.5-27BCurrent	Gemma 4 31B
GPQA Diamond 综合评估	85.50Thinking Enabled	84.30Thinking Enabled
HLE 综合评估	48.50Thinking Enabled ｜ Tools	26.50Thinking Enabled ｜ Tools
MMLU Pro 综合评估	86.10Thinking Enabled	85.20Thinking Enabled
LiveCodeBench 编程与软件工程	80.70Thinking Enabled ｜ Tools	80.00Thinking Enabled
τ²-Bench Agent能力评测	79.00Thinking Enabled ｜ Tools	76.90Thinking Enabled ｜ Tools

Standard API Pricing: Qwen3.5-27B vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

Version History

How each version of the Qwen3.5-27B series stacks up on benchmark tests

Qwen3.5-27BQwen3-32B Qwen2.5-32B

Benchmark categories:

2 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Qwen3.5-27BCurrent	Qwen2.5-32B
MMLU Pro 综合评估	86.10Thinking Enabled	69.23Standard Mode
LiveCodeBench 编程与软件工程	80.70Thinking Enabled ｜ Tools	51.20Standard Mode

Single-Benchmark Version Trend

Viewing: MMLU Pro · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Qwen3.5-27B Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.

Qwen3-32B

Supplier: 阿里巴巴

Standard input: ¥0.0012 / 1K tokens

Standard output: ¥0.0048 / 1K tokens

Qwen2.5-32B

Supplier: 阿里巴巴

Standard input: ¥0.002 / 1K tokens

Standard output: ¥0.006 / 1K tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Qwen3-32B	阿里巴巴	¥0.0012 / 1K tokens	¥0.0048 / 1K tokens	—
Qwen2.5-32B	阿里巴巴	¥0.002 / 1K tokens	¥0.006 / 1K tokens	—

Sources

huggingface.cohuggingface.co