GLM-4.6 Benchmark Details

GLM-4.6 currently shows benchmark results led by AIME2025 (15 / 106, score 98.60), LiveCodeBench (18 / 120, score 84.50), MMLU Pro (43 / 126, score 83).

Benchmark Results

GLM-4.6

Benchmark Results

General Knowledge

9 evaluations

Benchmark / mode

Score

Rank/total

MMLU Pro

43 / 126

MMLU Pro

69 / 126

GPQA Diamond

82.90

62 / 179

GPQA Diamond

70 / 179

GPQA Diamond

136 / 179

LiveBench

Standard Mode

55.19

81 / 115

HLE

30.40

76 / 159

HLE

17.20

118 / 159

HLE

5.20

152 / 159

Coding and Software Engineer

5 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

84.50

18 / 120

LiveCodeBench

82.80

24 / 120

LiveCodeBench

79 / 120

SWE-bench Verified

65 / 108

SWE-bench Verified

65 / 108

Math and Reasoning

4 evaluations

Benchmark / mode

Score

Rank/total

AIME2025

98.60

15 / 106

AIME2025

98.60

15 / 106

AIME2025

92 / 106

FrontierMath - Tier 4

Standard Mode

2.10

56 / 80

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal-Bench

40.50

12 / 35

Agent Level Benchmark

2 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench

75.90

20 / 40

τ²-Bench - Telecom

27 / 35

Instruction Following

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

29 / 29

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

45.10

38 / 45

Compare with other models