Qwen 3.6 Plus PreviewvsGLM 5.1

Across 8 shared benchmarks, GLM 5.1 leads overall: Qwen 3.6 Plus Preview wins 2, GLM 5.1 wins 4, with 2 ties and an average score difference of -0.18.

Qwen 3.6 Plus Preview

阿里巴巴 · 2026-03-31 · Chat model

GLM 5.1

智谱AI · 2026-03-27 · Reasoning model

Qwen 3.6 Plus Preview2 wins(25%)Ties2(50%)4 winsGLM 5.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

General Knowledge

Qwen 3.6 Plus Preview 2/3

Benchmark	Qwen 3.6 Plus Preview	GLM 5.1	Diff
GPQA Diamond	90.4019 / 187Thinking (No Tools)	86.2047 / 187Thinking (No Tools)	+4.20
HLE	50.6024 / 172Thinking (With Tools)	52.3019 / 172Thinking (With Tools)	-1.70
LiveBench	70.8534 / 115Normal (No Tools)	70.1837 / 115Normal (No Tools)	+0.67

AI Agent - Tool Usage

GLM 5.1 2/2

Benchmark	Qwen 3.6 Plus Preview	GLM 5.1	Diff
Terminal Bench 2.0	61.6016 / 47Thinking (With Tools)	63.5013 / 47Thinking (With Tools)	-1.90
Tool Decathlon	39.806 / 9Thinking (With Tools)	40.705 / 9Thinking (With Tools)	-0.90

Math and Reasoning

Even 2/2

Benchmark	Qwen 3.6 Plus Preview	GLM 5.1	Diff
AIME 2026	95.304 / 18Thinking (No Tools)	95.304 / 18Thinking (No Tools)	—
IMO-AnswerBench	83.8012 / 21Thinking (No Tools)	83.8012 / 21Thinking (No Tools)	—

Coding and Software Engineer

GLM 5.1 1/1

Benchmark	Qwen 3.6 Plus Preview	GLM 5.1	Diff
SWE-Bench Pro - Public	56.6020 / 54Thinking (With Tools)	58.4015 / 54Thinking (With Tools)	-1.80

Specs

Field	Qwen 3.6 Plus Preview	GLM 5.1
Publisher	阿里巴巴	智谱AI
Release date	2026-03-31	2026-03-27
Model type	Chat model	Reasoning model
Architecture	Dense	MoE
Parameters	Not available	75.4B
Context length	1M	200K
Max output	64K	125K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Qwen 3.6 Plus Preview	GLM 5.1
Text input	$0.5 / 1M tokens	$1.4 / 1M tokens
Text output	$3 / 1M tokens	$4.4 / 1M tokens
Cache read	$0.05 / 1M tokens	$4.4 / 1M tokens
Cache write	$0.625 / 1M tokens	$0.26 / 1M tokens

Summary

Qwen 3.6 Plus Previewleads in:General Knowledge (2/3)
GLM 5.1leads in:AI Agent - Tool Usage (2/2), Coding and Software Engineer (1/1)
Tied in:Math and Reasoning

On average across the 8 shared benchmarks, GLM 5.1 scores 0.18 higher.

Largest single-benchmark gap: GPQA Diamond — Qwen 3.6 Plus Preview 90.40 vs GLM 5.1 86.20 (+4.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Qwen 3.6 Plus Preview details GLM 5.1 details·Customize in compare tool