DeepSeek-V4-ProvsGLM 5.1

Across 6 shared benchmarks, GLM 5.1 leads overall: DeepSeek-V4-Pro wins 1, GLM 5.1 wins 5, with 0 ties and an average score difference of -18.83.

DeepSeek-AI · 2026-04-24 · Reasoning model

智谱AI · 2026-03-27 · Reasoning model

DeepSeek-V4-Pro1 win(17%)(83%)5 winsGLM 5.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

GLM 5.1 2/2

Benchmark	DeepSeek-V4-Pro	GLM 5.1	Diff
HLE	7.70133 / 149Normal (No Tools)	52.309 / 149Thinking (With Tools)	-44.60
GPQA Diamond	72.9099 / 175Normal (No Tools)	86.2039 / 175Thinking (No Tools)	-13.30

DeepSeek-V4-Pro 1/1

Benchmark

Prices use DataLearner records when available; missing fields are not inferred.

DeepSeek-V4-Proleads in:AI Agent - Information Search (1/1)
GLM 5.1leads in:General Knowledge (2/2), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Math and Reasoning (1/1)

On average across the 6 shared benchmarks, GLM 5.1 scores 18.83 higher.

Largest single-benchmark gap: IMO-AnswerBench — DeepSeek-V4-Pro 35.30 vs GLM 5.1 83.80 (-48.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.