Pinch Bench
Updated Apr 28, 2026·1,516 views
- Problem Count
- 23
- Institution
- Kilo Code
- Category
- OpenClaw Agent Evaluation
- Metrics
- Accuracy
- Language
- English
- Difficulty
- Medium
Overview
Pinch Bench is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.
Related resources
Latest Pinch Bench model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for Pinch Bench.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
License:
Origin:
Model release cutoff:
Pinch Bench Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() GPT-5.4 Thinking EnabledTools | 90.50 | 2026-03-05 | Unknown | Closed | |
![]() Qwen3.5-27B Thinking EnabledTools | 90.00 | 2026-02-25 | 27B | Free Commercial | |
![]() Qwen3.5-397B-A17B Thinking EnabledTools | 89.10 | 2026-02-16 | 39.7B | Free Commercial | |
4 | ![]() Claude Sonnet 4.5 Thinking EnabledTools | 88.20 | 2025-09-30 | Unknown | Closed |
5 | ![]() Claude Sonnet 4.6 Thinking EnabledTools | 88.00 | 2026-02-17 | Unknown | Closed |
6 | ![]() MiniMax M2.5 Thinking EnabledTools | 87.80 | 2026-02-12 | 229B | Free Commercial |
7 | ![]() Claude Opus 4.6 Thinking EnabledTools | 87.40 | 2026-02-05 | Unknown | Closed |
8 | ![]() Opus 4.5 Extended ThinkingTools | 87.20 | 2025-11-25 | Unknown | Closed |
9 | ![]() MiniMax-M2.7 Thinking EnabledTools | 87.10 | 2026-03-18 | 229B | Non-Commercial |
10 | ![]() Gemini 3.1 Pro Preview Thinking EnabledTools | 86.70 | 2026-02-20 | Unknown | Closed |
11 | ![]() GLM-5-Turbo Thinking EnabledTools | 86.50 | 2026-03-15 | Unknown | Closed |
12 | ![]() GLM-5 Thinking EnabledTools | 86.40 | 2026-02-11 | 744B | Free Commercial |
13 | ![]() GLM-4.5-Air Thinking EnabledTools | 85.70 | 2025-07-28 | 106B | Free Commercial |
14 | ![]() Qwen3.5-122B-A10B Thinking EnabledTools | 85.50 | 2026-02-25 | 122B | Free Commercial |
15 | ![]() Step 3.5 Flash Thinking EnabledTools | 85.30 | 2026-02-02 | 196B | Free Commercial |
16 | ![]() Gemini 3.0 Flash Thinking EnabledTools | 85.20 | 2025-12-17 | Unknown | Closed |
17 | ![]() Kimi K2.5 Thinking EnabledTools | 84.80 | 2026-01-27 | 1000B | Free Commercial |
18 | ![]() DeepSeek V3.2 Thinking EnabledTools | 84.30 | 2025-12-01 | 671B | Free Commercial |
19 | ![]() M2.1 Thinking EnabledTools | 84.30 | 2025-12-23 | 230B | Free Commercial |
20 | Grok 4.1 Fast Thinking EnabledTools | 82.40 | 2025-11-19 | Unknown | Closed |
21 | ![]() Haiku 4.5 Thinking EnabledTools | 82.00 | 2025-10-15 | Unknown | Closed |
22 | ![]() Claude Sonnet 4 Thinking EnabledTools | 80.50 | 2025-05-23 | Unknown | Closed |
23 | ![]() GPT-5-mini Thinking EnabledTools | 80.30 | 2025-08-07 | Unknown | Closed |
24 | ![]() Qwen3-Max-Thinking Thinking EnabledTools | 80.30 | 2026-01-26 | 1000B | Closed |
25 | ![]() Qwen3-Coder-Next Thinking EnabledTools | 79.10 | 2026-02-03 | 8B | Free Commercial |
26 | ![]() Qwen3.5-35B-A3B Thinking EnabledTools | 78.40 | 2026-02-25 | 35B | Free Commercial |
27 | ![]() GPT-4o mini Thinking EnabledTools | 75.00 | 2024-07-18 | Unknown | Closed |
28 | ![]() Mistral Large 3 Thinking EnabledTools | 72.20 | 2025-12-02 | 675B | Free Commercial |
29 | ![]() Gemini 2.5 Pro Experimental 03-25 Thinking EnabledTools | 71.90 | 2025-03-25 | Unknown | Closed |
30 | ![]() GPT-4o Thinking EnabledTools | 71.10 | 2024-05-13 | Unknown | Closed |
Scroll to load 7 more









