GLM-5vsGLM-4.7

Across 11 shared benchmarks, GLM-5 leads overall: GLM-5 wins 9, GLM-4.7 wins 1, with 1 ties and an average score difference of +7.63.

智谱AI · 2026-02-11 · Chat model

智谱AI · 2025-12-22 · Chat model

GLM-59 wins(82%)Ties1(9%)1 winGLM-4.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

GLM-5 3/3

Benchmark	GLM-5	GLM-4.7	Diff
LiveBench	68.8543 / 115Normal (No Tools)	58.0978 / 115Normal (No Tools)	+10.76
HLE	50.4025 / 172	42.8052 / 172	+7.60
GPQA Diamond	8648 / 187Thinking (No Tools)	85.7049 / 187	+0.30

GLM-5 2/2

Benchmark	GLM-5	GLM-4.7	Diff
Terminal Bench Hard	432 / 13	33.307 / 13	+9.70
τ²-Bench	89.704 / 43	87.406 / 43	+2.30

GLM-4.7 1/2

Benchmark	GLM-5	GLM-4.7	Diff
AIME 2026	92.709 / 18Thinking (No Tools)	92.908 / 18	-0.20
FrontierMath - Tier 4	2.1056 / 80Normal (No Tools)	2.1056 / 80Normal (No Tools)	—

GLM-5 1/1

Benchmark	GLM-5	GLM-4.7	Diff
BrowseComp	75.9024 / 53	5241 / 53	+23.90

GLM-5 1/1

Benchmark	GLM-5	GLM-4.7	Diff
Terminal Bench 2.0	61.1018 / 47	4144 / 47	+20.10

GLM-5 1/1

Benchmark	GLM-5	GLM-4.7	Diff
SWE-bench Verified	77.8025 / 112Thinking (No Tools)	73.8043 / 112	+4

GLM-5 1/1

Benchmark	GLM-5	GLM-4.7	Diff
Simple Bench	53.2023 / 63Normal (No Tools)	47.7029 / 63Thinking (No Tools)	+5.50

Prices use DataLearner records when available; missing fields are not inferred.

GLM-5leads in:General Knowledge (3/3), Agent Level Benchmark (2/2), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Commonsense Reasoning (1/1)
GLM-4.7leads in:Math and Reasoning (1/2)

On average across the 11 shared benchmarks, GLM-5 scores 7.63 higher.

Largest single-benchmark gap: BrowseComp — GLM-5 75.90 vs GLM-4.7 52 (+23.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.