DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

大模型 Agent 能力评测排行榜

本页面提供大模型 Agent 能力评测排行榜,涵盖 Aider-Polyglot、τ²-Bench、Terminal Bench 2.0、Tool Decathlon、OSWorld-Verified 等主流 Agent 评测基准,深度对比 GPT、Claude、Qwen、DeepSeek 等模型的工具使用、任务规划与自主执行能力。

Updated on 2026-04-28 13:02:03

As of 2026-04, this page covers Aider-Polyglot, τ²-Bench, Terminal Bench 2.0, Tool Decathlon and related benchmarks for 大模型 Agent 能力评测排行榜, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
Agent能力评测Aider-Polyglotτ²-Bench
AI Agent - 工具使用Terminal Bench 2.0Tool DecathlonOSWorld-Verified
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
阿里巴巴
Qwen3.5-397B-A17B
—86.7052.5038.3062.20Free commercial
阿里巴巴
Qwen3.6-35B-A3B
——51.5026.90—Free commercial
阿里巴巴
Qwen3-32B
40.00————Free commercial
4
智谱AI
GLM-4.7-Flash
—79.50———Free commercial
5
阿里巴巴
Qwen3.5-27B
—79.0041.60—56.20Free commercial
6
阿里巴巴
Qwen3-30B-A3B-2507
—49.00———Free commercial
7
OpenAI
GPT OSS 20B
—47.70———Free commercial
8
阿里巴巴
Qwen3.6-27B
——59.30——Free commercial
Qwen3.5-397B-A17B
Aider-Polyglot—
τ²-Bench86.70
Terminal Bench 2.052.50
Tool Decathlon38.30
OSWorld-Verified62.20
Free commercial
Qwen3.6-35B-A3B
Aider-Polyglot—
τ²-Bench—
Terminal Bench 2.051.50
Tool Decathlon26.90
OSWorld-Verified—
Free commercial
Qwen3-32B
Aider-Polyglot40.00
τ²-Bench—
Terminal Bench 2.0—
Tool Decathlon—
OSWorld-Verified—
Free commercial
4
GLM-4.7-Flash
Aider-Polyglot—
τ²-Bench79.50
Terminal Bench 2.0—
Tool Decathlon—
OSWorld-Verified—
Free commercial
5
Qwen3.5-27B
Aider-Polyglot—
τ²-Bench79.00
Terminal Bench 2.041.60
Tool Decathlon—
OSWorld-Verified56.20
Free commercial
6
Qwen3-30B-A3B-2507
Aider-Polyglot—
τ²-Bench49.00
Terminal Bench 2.0—
Tool Decathlon—
OSWorld-Verified—
Free commercial
7
GPT OSS 20B
Aider-Polyglot—
τ²-Bench47.70
Terminal Bench 2.0—
Tool Decathlon—
OSWorld-Verified—
Free commercial
8
Qwen3.6-27B
Aider-Polyglot—
τ²-Bench—
Terminal Bench 2.059.30
Tool Decathlon—
OSWorld-Verified—
Free commercial
Sort by: