GLM-5.2vsGLM 5.1

Across 6 shared benchmarks, GLM-5.2 leads overall: GLM-5.2 wins 6, GLM 5.1 wins 0, with 0 ties and an average score difference of +7.42.

智谱AI · 2026-06-13 · Reasoning model

智谱AI · 2026-03-27 · Reasoning model

GLM-5.26 wins(100%)(0%)0 winsGLM 5.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

GLM-5.2 2/2

Benchmark	GLM-5.2	GLM 5.1	Diff
GPQA Diamond	91.2015 / 179Thinking (No Tools)	86.2043 / 179Thinking (No Tools)	+5
HLE	54.708 / 159Thinking (With Tools)	52.3013 / 159Thinking (With Tools)	+2.40

GLM-5.2 2/2

Benchmark	GLM-5.2	GLM 5.1	Diff
IMO-AnswerBench	911 / 20Thinking (No Tools)	83.8011 / 20Thinking (No Tools)	+7.20
AIME 2026	99.201 / 15Thinking (No Tools)	95.303 / 15Thinking (No Tools)	+3.90

GLM-5.2 1/1

Benchmark	GLM-5.2	GLM 5.1	Diff
TerminalBench 2.1	814 / 14Thinking High (With Tools)	58.7012 / 14Thinking High (With Tools)	+22.30

GLM-5.2 1/1

Benchmark	GLM-5.2	GLM 5.1	Diff
SWE-Bench Pro - Public	62.105 / 44Thinking (With Tools)	58.4010 / 44Thinking (With Tools)	+3.70

Prices use DataLearner records when available; missing fields are not inferred.

GLM-5.2leads in:General Knowledge (2/2), Math and Reasoning (2/2), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1)

On average across the 6 shared benchmarks, GLM-5.2 scores 7.42 higher.

Largest single-benchmark gap: TerminalBench 2.1 — GLM-5.2 81 vs GLM 5.1 58.70 (+22.30).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.