GLM-5vsKimi K2.5

Across 14 shared benchmarks, GLM-5 leads overall: GLM-5 wins 9, Kimi K2.5 wins 5, with 0 ties and an average score difference of +0.86.

智谱AI
GLM-5

智谱AI · 2026-02-11 · Chat model

Moonshot AI
Kimi K2.5

Moonshot AI · 2026-01-27 · Multimodal model

GLM-59 wins(64%)(36%)5 winsKimi K2.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 14 shared benchmarks.

General Knowledge

Kimi K2.5 3/4
BenchmarkGLM-5Kimi K2.5Diff
ARC-AGI44.7044 / 65Thinking (No Tools)65.3031 / 65Thinking (No Tools)-20.60
ARC-AGI-24.9044 / 59Thinking (No Tools)11.8036 / 59Thinking (No Tools)-6.90
GPQA Diamond8643 / 178Thinking (No Tools)87.6034 / 178Thinking (No Tools)-1.60
HLE50.4018 / 15750.2020 / 157Thinking (With Tools)+0.20

Math and Reasoning

GLM-5 2/3
BenchmarkGLM-5Kimi K2.5Diff
FrontierMath - Tier 42.1056 / 80Normal (No Tools)4.2040 / 80Normal (No Tools)-2.10
IMO-AnswerBench82.5013 / 19Thinking (No Tools)81.8014 / 19Thinking (No Tools)+0.70
AIME 202692.707 / 14Thinking (No Tools)92.5010 / 14Thinking (No Tools)+0.20

Claw-style Agent Evaluation

GLM-5 2/2
BenchmarkGLM-5Kimi K2.5Diff
Claw Bench91.705 / 29Thinking (With Tools)81.7018 / 29Thinking (With Tools)+10
Pinch Bench86.4012 / 37Thinking (With Tools)84.8017 / 37Thinking (With Tools)+1.60

AI Agent - Information Search

GLM-5 1/1
BenchmarkGLM-5Kimi K2.5Diff
BrowseComp75.9019 / 4560.6029 / 45Thinking (With Tools + Internet)+15.30

AI Agent - Tool Usage

GLM-5 1/1
BenchmarkGLM-5Kimi K2.5Diff
Terminal Bench 2.061.1018 / 4650.8033 / 46Thinking (With Tools)+10.30

Coding and Software Engineer

GLM-5 1/1
BenchmarkGLM-5Kimi K2.5Diff
SWE-bench Verified77.8023 / 108Thinking (No Tools)76.8027 / 108Thinking (With Tools)+1

Long Context

Kimi K2.5 1/1
BenchmarkGLM-5Kimi K2.5Diff
AA-LCR6312 / 13Thinking (No Tools)6510 / 13Thinking (No Tools)-2

Productivity Knowledge

GLM-5 1/1
BenchmarkGLM-5Kimi K2.5Diff
GDPval-AA4614 / 21Thinking (No Tools)4015 / 21Thinking (No Tools)+6

Specs

FieldGLM-5Kimi K2.5
Publisher智谱AIMoonshot AI
Release date2026-02-112026-01-27
Model typeChat modelMultimodal model
ArchitectureMoEMoE
Parameters744B1T
Context length200K256K
Max output128K16K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGLM-5Kimi K2.5
Text input$1 / 1M tokensNot public
Text output$3.2 / 1M tokensNot public
Cache write$0.2 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • GLM-5leads in:Math and Reasoning (2/3), Claw-style Agent Evaluation (2/2), AI Agent - Information Search (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Productivity Knowledge (1/1)
  • Kimi K2.5leads in:General Knowledge (3/4), Long Context (1/1)

On average across the 14 shared benchmarks, GLM-5 scores 0.86 higher.

Largest single-benchmark gap: ARC-AGI — GLM-5 44.70 vs Kimi K2.5 65.30 (-20.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.