DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款
Page navigation
目录
Model catalogKimi K2.5Benchmark analysis

Kimi K2.5 Benchmark Details

Below are Kimi K2.5's benchmark scores and model comparisons. In-depth analysis is being prepared.

Benchmark Results

Kimi K2.5

Benchmark Results

Thinking

综合评估

4 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
default
87.60
21 / 158
MMLU Pro
default
78.50
54 / 114
HLE
default
50.20
44 / 111
HLE
default
30.10
44 / 111

编程与软件工程

3 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
default
85
8 / 105
SWE-bench Verified
default
76.80
13 / 90
SWE-Bench Pro - Public
default
50.70
7 / 16

数学推理

3 evaluations
Benchmark / mode
Score
Rank/total
AIME2025
default
96.10
20 / 106
AIME 2026
default
92.50
5 / 7
IMO-AnswerBench
default
81.80
5 / 7

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
default
46.80
13 / 27

AI Agent - 信息收集

2 evaluations
Benchmark / mode
Score
Rank/total
BrowseComp
default
74.90
18 / 33
BrowseComp
default
60.60
18 / 33

AI Agent - 工具使用

1 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
default
50.80
14 / 22

生产力知识

1 evaluations
Benchmark / mode
Score
Rank/total
GDPval-AA
default
40
8 / 14

长上下文能力

1 evaluations
Benchmark / mode
Score
Rank/total
AA-LCR
default
65
8 / 11
与其他模型对比