GLM-5vsKimi K2.5

Across 17 shared benchmarks, GLM-5 leads overall: GLM-5 wins 10, Kimi K2.5 wins 7, with 0 ties and an average score difference of +1.06.

智谱AI · 2026-02-11 · Chat model

Moonshot AI · 2026-01-27 · Multimodal model

GLM-510 wins(59%)(41%)7 winsKimi K2.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 17 shared benchmarks.

Kimi K2.5 4/5

Benchmark	GLM-5	Kimi K2.5	Diff
ARC-AGI	44.7047 / 68Thinking (No Tools)	65.3034 / 68Thinking (No Tools)	-20.60
ARC-AGI-2	4.9047 / 62Thinking (No Tools)	11.8039 / 62Thinking (No Tools)	-6.90
GPQA Diamond	8648 / 187Thinking (No Tools)	87.6037 / 187Thinking (No Tools)	-1.60
LiveBench	68.8543 / 115Normal (No Tools)	69.0742 / 115Thinking (No Tools)	-0.22
HLE	50.4025 / 172	50.2027 / 172Thinking (With Tools)	+0.20

GLM-5 2/3

Benchmark	GLM-5	Kimi K2.5	Diff
FrontierMath - Tier 4	2.1056 / 80Normal (No Tools)	4.2040 / 80Normal (No Tools)	-2.10
IMO-AnswerBench	82.5015 / 21Thinking (No Tools)	81.8016 / 21Thinking (No Tools)	+0.70
AIME 2026	92.709 / 18Thinking (No Tools)	92.5012 / 18Thinking (No Tools)	+0.20

GLM-5 2/2

Benchmark	GLM-5	Kimi K2.5	Diff
Claw Bench	91.705 / 29Thinking (With Tools)	81.7018 / 29Thinking (With Tools)	+10
Pinch Bench	86.4012 / 37Thinking (With Tools)	84.8017 / 37Thinking (With Tools)	+1.60

Kimi K2.5 2/2

Benchmark	GLM-5	Kimi K2.5	Diff
AA-LCR	6314 / 15Thinking (No Tools)	6512 / 15Thinking (No Tools)	-2
LongBench v2	60.806 / 11Normal (No Tools)	615 / 11Normal (No Tools)	-0.20

GLM-5 1/1

Benchmark	GLM-5	Kimi K2.5	Diff
BrowseComp	75.9024 / 53	60.6036 / 53Thinking (With Tools + Internet)	+15.30

GLM-5 1/1

Benchmark	GLM-5	Kimi K2.5	Diff
Terminal Bench 2.0	61.1018 / 47	50.8034 / 47Thinking (With Tools)	+10.30

GLM-5 1/1

Benchmark	GLM-5	Kimi K2.5	Diff
SWE-bench Verified	77.8025 / 112Thinking (No Tools)	76.8030 / 112Thinking (With Tools)	+1

GLM-5 1/1

Benchmark	GLM-5	Kimi K2.5	Diff
Simple Bench	53.2023 / 63Normal (No Tools)	46.8030 / 63Thinking (No Tools)	+6.40

GLM-5 1/1

Benchmark	GLM-5	Kimi K2.5	Diff
GDPval-AA	4614 / 21Thinking (No Tools)	4015 / 21Thinking (No Tools)	+6

Prices use DataLearner records when available; missing fields are not inferred.

GLM-5leads in:Math and Reasoning (2/3), Claw-style Agent Evaluation (2/2), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Commonsense Reasoning (1/1), Productivity Knowledge (1/1)
Kimi K2.5leads in:General Knowledge (4/5), Long Context (2/2)

On average across the 17 shared benchmarks, GLM-5 scores 1.06 higher.

Largest single-benchmark gap: ARC-AGI — GLM-5 44.70 vs Kimi K2.5 65.30 (-20.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.