GLM-5vsGLM-4.6

Across 8 shared benchmarks, GLM-5 leads overall: GLM-5 wins 7, GLM-4.6 wins 0, with 1 ties and an average score difference of +22.32.

智谱AI · 2026-02-11 · AI model

智谱AI · 2025-09-30 · AI model

GLM-57 wins(88%)Ties1(0%)0 winsGLM-4.6

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

GLM-5 2/2

Benchmark	GLM-5	GLM-4.6	Diff
τ²-Bench - Telecom	985 / 35thinking + 使用工具	7127 / 35thinking + 使用工具	+27
τ²-Bench	89.704 / 40thinking + 使用工具	75.9020 / 40thinking + 使用工具	+13.80

GLM-5 2/2

Benchmark

Prices use DataLearner records when available; missing fields are not inferred.

GLM-5leads in:Agent Level Benchmark (2/2), General Knowledge (2/2), AI Agent - Information Search (1/1), Coding and Software Engineer (1/1), Instruction Following (1/1)
Tied in:Math and Reasoning

On average across the 8 shared benchmarks, GLM-5 scores 22.32 higher.

Largest single-benchmark gap: HLE — GLM-5 50.40 vs GLM-4.6 5.20 (+45.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

HLE	50.4015 / 149thinking + 使用工具	5.20142 / 149	+45.20
GPQA Diamond	8640 / 175Thinking (No Tools)	63132 / 175	+23