GLM-5.2vsGLM-4.7

Across 4 shared benchmarks, GLM-5.2 leads overall: GLM-5.2 wins 4, GLM-4.7 wins 0, with 0 ties and an average score difference of +11.30.

智谱AI
GLM-5.2

智谱AI · 2026-06-13 · Reasoning model

智谱AI
GLM-4.7

智谱AI · 2025-12-22 · Chat model

GLM-5.24 wins(100%)(0%)0 winsGLM-4.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 4 shared benchmarks.

General Knowledge

GLM-5.2 2/2
BenchmarkGLM-5.2GLM-4.7Diff
HLE54.708 / 159Thinking (With Tools)42.8042 / 159+11.90
GPQA Diamond91.2015 / 179Thinking (No Tools)85.7045 / 179+5.50

Coding and Software Engineer

GLM-5.2 1/1
BenchmarkGLM-5.2GLM-4.7Diff
SWE-Bench Pro - Public62.105 / 44Thinking (With Tools)40.6040 / 44+21.50

Math and Reasoning

GLM-5.2 1/1
BenchmarkGLM-5.2GLM-4.7Diff
AIME 202699.201 / 15Thinking (No Tools)92.907 / 15+6.30

Specs

FieldGLM-5.2GLM-4.7
Publisher智谱AI智谱AI
Release date2026-06-132025-12-22
Model typeReasoning modelChat model
ArchitectureMoEMoE
Parameters753.33B358B
Context length1M200K
Max output128K132072

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGLM-5.2GLM-4.7
Text input$1.4 / 1M tokensNot public
Text output$4.4 / 1M tokensNot public
Cache read$0.26 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • GLM-5.2leads in:General Knowledge (2/2), Coding and Software Engineer (1/1), Math and Reasoning (1/1)

On average across the 4 shared benchmarks, GLM-5.2 scores 11.30 higher.

Largest single-benchmark gap: SWE-Bench Pro - Public — GLM-5.2 62.10 vs GLM-4.7 40.60 (+21.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.