GLM-5vsGLM-4.7
Across 9 shared benchmarks, GLM-5 leads overall: GLM-5 wins 7, GLM-4.7 wins 1, with 1 ties and an average score difference of +7.52.
GLM-57 wins(78%)Ties1(11%)1 winGLM-4.7
Benchmark scores
Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.
Agent Level Benchmark
GLM-5 2/2| Benchmark | GLM-5 | GLM-4.7 | Diff |
|---|---|---|---|
| Terminal Bench Hard | 432 / 13 | 33.307 / 13 | +9.70 |
| τ²-Bench | 89.704 / 40 | 87.406 / 40 | +2.30 |
General Knowledge
GLM-5 2/2| Benchmark | GLM-5 | GLM-4.7 | Diff |
|---|---|---|---|
| HLE | 50.4018 / 157 | 42.8041 / 157 | +7.60 |
| GPQA Diamond | 8643 / 178Thinking (No Tools) | 85.7044 / 178 | +0.30 |
Math and Reasoning
GLM-4.7 1/2| Benchmark | GLM-5 | GLM-4.7 | Diff |
|---|---|---|---|
| AIME 2026 | 92.707 / 14Thinking (No Tools) | 92.906 / 14 | -0.20 |
| FrontierMath - Tier 4 | 2.1056 / 80Normal (No Tools) | 2.1056 / 80Normal (No Tools) | — |
AI Agent - Information Search
GLM-5 1/1| Benchmark | GLM-5 | GLM-4.7 | Diff |
|---|---|---|---|
| BrowseComp | 75.9019 / 45 | 5234 / 45 | +23.90 |
AI Agent - Tool Usage
GLM-5 1/1| Benchmark | GLM-5 | GLM-4.7 | Diff |
|---|---|---|---|
| Terminal Bench 2.0 | 61.1018 / 46 | 4143 / 46 | +20.10 |
Coding and Software Engineer
GLM-5 1/1| Benchmark | GLM-5 | GLM-4.7 | Diff |
|---|---|---|---|
| SWE-bench Verified | 77.8023 / 108Thinking (No Tools) | 73.8039 / 108 | +4 |
Specs
| Field | GLM-5 | GLM-4.7 |
|---|---|---|
| Publisher | 智谱AI | 智谱AI |
| Release date | 2026-02-11 | 2025-12-22 |
| Model type | Chat model | Chat model |
| Architecture | MoE | MoE |
| Parameters | 744B | 358B |
| Context length | 200K | 200K |
| Max output | 128K | 132072 |
API pricing
Prices use DataLearner records when available; missing fields are not inferred.
| Item | GLM-5 | GLM-4.7 |
|---|---|---|
| Text input | $1 / 1M tokens | Not public |
| Text output | $3.2 / 1M tokens | Not public |
| Cache write | $0.2 / 1M tokens | Not public |
One or both models have incomplete public pricing.
Summary
- GLM-5leads in:Agent Level Benchmark (2/2), General Knowledge (2/2), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1)
- GLM-4.7leads in:Math and Reasoning (1/2)
On average across the 9 shared benchmarks, GLM-5 scores 7.52 higher.
Largest single-benchmark gap: BrowseComp — GLM-5 75.90 vs GLM-4.7 52 (+23.90).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.