GLM-5vsGLM-4.6

在 8 个共同 benchmark 中，GLM-5 整体领先：GLM-5 领先 7 项，GLM-4.6 领先 0 项，持平 1 项，平均分差 +16.69。

智谱AI · 2026-02-11 · 聊天大模型

智谱AI · 2025-09-30 · 聊天大模型

GLM-57 项(88%)持平1(0%)0 项GLM-4.6

评测分数

按能力类目分组，每组内按分差大小排列；共 8 项。

GLM-5 领先 2/2

评测项	GLM-5	GLM-4.6	分差
τ²-Bench - Telecom	985 / 35	7127 / 35	+27
τ²-Bench	89.704 / 40	75.9020 / 40	+13.80

GLM-5 领先 2/2

评测项	GLM-5	GLM-4.6	分差
HLE	50.4018 / 157	30.4074 / 157	+20
GPQA Diamond	8643 / 178Thinking (No Tools)	82.9061 / 178	+3.10

GLM-5 领先 1/1

评测项	GLM-5	GLM-4.6	分差
BrowseComp	75.9019 / 45	45.1038 / 45	+30.80

GLM-5 领先 1/1

评测项	GLM-5	GLM-4.6	分差
SWE-bench Verified	77.8023 / 108Thinking (No Tools)	6865 / 108	+9.80

GLM-5 领先 1/1

评测项	GLM-5	GLM-4.6	分差
IF Bench	7210 / 29	4329 / 29	+29

胶着 1/1

评测项	GLM-5	GLM-4.6	分差
FrontierMath - Tier 4	2.1056 / 80Normal (No Tools)	2.1056 / 80Normal (No Tools)	持平

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

部分模型公开价格不完整，缺失字段按"暂无公开价格"展示。

GLM-5在以下类目领先:Agent Level Benchmark (2/2)、General Knowledge (2/2)、AI Agent - Information Search (1/1)、Coding and Software Engineer (1/1)、Instruction Following (1/1)
胶着类目:Math and Reasoning

8 个共同 benchmark 上，GLM-5 平均高出 16.69 分。

单项差距最大的 benchmark：BrowseComp — GLM-5 75.90，GLM-4.6 45.10（分差 +30.80）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。