LLM Agent Benchmark Leaderboard

Name: LLM Agent Benchmark Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides the LLM Agent benchmark leaderboard, covering Aider-Polyglot, τ²-Bench, Terminal Bench 2.0, Tool Decathlon, and OSWorld-Verified. Compare GPT, Claude, Qwen, and DeepSeek on tool use, task planning, and autonomous execution.

Updated on 2026-07-28 08:43:41

As of 2026-07, this page covers Aider-Polyglot, τ²-Bench, Terminal Bench 2.0, Tool Decathlon and related benchmarks for LLM Agent Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark

Agent能力评测Aider-Polyglot τ²-Bench

AI Agent - 工具使用Terminal Bench 2.0 Tool Decathlon OSWorld-Verified

More Benchmarks

Model Size:All 3B and below 7B 13B 34B 65B 100B and above

Model Type:All Reasoning Models Foundation Models Instruction/Chat Models Coding Models

Source:All Open Source Closed Source

Origin:All China

Model release cutoff:

Top picks

Ranked by Aider-Polyglot

Current SOTA

Qwen3-32B

阿里巴巴

40.00Aider-Polyglot

View model

Best Open-Source

Qwen3-32B

阿里巴巴

40.00Aider-Polyglot

View model

Best China-Made

Qwen3-32B

阿里巴巴

40.00Aider-Polyglot

View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

Rank	Model						License
	Qwen3-32B 阿里巴巴	40.00	—	—	—	—	Free commercial	Details
	QwQ-32B 阿里巴巴	20.90	—	—	—	—	Free commercial	Details
	Qwen2.5-Coder-32B-Instruct 阿里巴巴	16.40	—	—	—	—	Free commercial	Details
4	Gemma 3 - 27B (IT) Google Deep Mind	4.90	—	—	—	—	Free commercial	Details
5	Qwen3.5-397B-A17B 阿里巴巴	—	86.70	52.50	38.30	62.20	Free commercial	Details
6	GLM-4.7-Flash 智谱AI	—	79.50	—	—	—	Free commercial	Details
7	Qwen3.5-27B 阿里巴巴	—	79.00	41.60	—	56.20	Free commercial	Details
8	Gemma 4 31B DeepMind	—	76.90	—	—	—	Free commercial	Details
9	Gemma 4 26B A4B DeepMind	—	68.20	—	—	—	Free commercial	Details
10	Qwen3-30B-A3B-2507 阿里巴巴	—	49.00	—	—	—	Free commercial	Details
11	GPT OSS 20B OpenAI	—	47.70	—	—	—	Free commercial	Details
12	Qwen3.6-27B 阿里巴巴	—	—	59.30	—	—	Free commercial	Details
13	Qwen3.6-35B-A3B 阿里巴巴	—	—	51.50	26.90	—	Free commercial	Details