Qwen 3.6 Plus PreviewvsGLM 5.1

Across 7 shared benchmarks, GLM 5.1 leads overall: Qwen 3.6 Plus Preview wins 1, GLM 5.1 wins 4, with 2 ties and an average score difference of -0.30.

阿里巴巴
Qwen 3.6 Plus Preview

阿里巴巴 · 2026-03-31 · Chat model

智谱AI
GLM 5.1

智谱AI · 2026-03-27 · Reasoning model

Qwen 3.6 Plus Preview1 win(14%)Ties2(57%)4 winsGLM 5.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.

AI Agent - Tool Usage

GLM 5.1 2/2
BenchmarkQwen 3.6 Plus PreviewGLM 5.1Diff
Terminal Bench 2.061.6016 / 46Thinking (With Tools)63.5013 / 46Thinking (With Tools)-1.90
Tool Decathlon39.804 / 7Thinking (With Tools)40.703 / 7Thinking (With Tools)-0.90

General Knowledge

Even 2/2
BenchmarkQwen 3.6 Plus PreviewGLM 5.1Diff
GPQA Diamond90.4017 / 178Thinking (No Tools)86.2042 / 178Thinking (No Tools)+4.20
HLE50.6017 / 157Thinking (With Tools)52.3012 / 157Thinking (With Tools)-1.70

Math and Reasoning

Even 2/2
BenchmarkQwen 3.6 Plus PreviewGLM 5.1Diff
AIME 202695.302 / 14Thinking (No Tools)95.302 / 14Thinking (No Tools)
IMO-AnswerBench83.8010 / 19Thinking (No Tools)83.8010 / 19Thinking (No Tools)

Coding and Software Engineer

GLM 5.1 1/1
BenchmarkQwen 3.6 Plus PreviewGLM 5.1Diff
SWE-Bench Pro - Public56.6013 / 43Thinking (With Tools)58.409 / 43Thinking (With Tools)-1.80

Specs

FieldQwen 3.6 Plus PreviewGLM 5.1
Publisher阿里巴巴智谱AI
Release date2026-03-312026-03-27
Model typeChat modelReasoning model
ArchitectureDenseMoE
ParametersNot available75.4B
Context length1M200K
Max output64K125K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemQwen 3.6 Plus PreviewGLM 5.1
Text input$0.5 / 1M tokens$1.4 / 1M tokens
Text output$3 / 1M tokens$4.4 / 1M tokens
Cache read$0.05 / 1M tokens$4.4 / 1M tokens
Cache write$0.625 / 1M tokens$0.26 / 1M tokens

Summary

  • GLM 5.1leads in:AI Agent - Tool Usage (2/2), Coding and Software Engineer (1/1)
  • Tied in:General Knowledge, Math and Reasoning

On average across the 7 shared benchmarks, GLM 5.1 scores 0.30 higher.

Largest single-benchmark gap: GPQA Diamond — Qwen 3.6 Plus Preview 90.40 vs GLM 5.1 86.20 (+4.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.