DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Home
  2. /
  3. Benchmarks
  4. /
  5. Pinch Bench

Pinch Bench

Updated Apr 28, 2026·1,516 views
Current SOTA
OpenAI
GPT-5.4
OpenAI
90.50Score
Problem Count
23
Institution
Kilo Code
Category
OpenClaw Agent Evaluation
Metrics
Accuracy
Language
English
Difficulty
Medium

Overview

Pinch Bench is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.

Related resources

  • View Paper
  • Get Dataset
  • Official Website
  • DataLearner Blog

Latest Pinch Bench model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for Pinch Bench.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend
License:
Origin:
Model release cutoff:

Pinch Bench Rank

RankModelLicense
OpenAI
GPT-5.4
Thinking EnabledTools
90.50
2026-03-05UnknownClosed
阿里巴巴
Qwen3.5-27B
Thinking EnabledTools
90.00
2026-02-2527BFree Commercial
阿里巴巴
Qwen3.5-397B-A17B
Thinking EnabledTools
89.10
2026-02-1639.7BFree Commercial
4
Anthropic
Claude Sonnet 4.5
Thinking EnabledTools
88.20
2025-09-30UnknownClosed
5
Anthropic
Claude Sonnet 4.6
Thinking EnabledTools
88.00
2026-02-17UnknownClosed
6
MiniMaxAI
MiniMax M2.5
Thinking EnabledTools
87.80
2026-02-12229BFree Commercial
7
Anthropic
Claude Opus 4.6
Thinking EnabledTools
87.40
2026-02-05UnknownClosed
8
Anthropic
Opus 4.5
Extended ThinkingTools
87.20
2025-11-25UnknownClosed
9
MiniMaxAI
MiniMax-M2.7
Thinking EnabledTools
87.10
2026-03-18229BNon-Commercial
10
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking EnabledTools
86.70
2026-02-20UnknownClosed
11
智谱AI
GLM-5-Turbo
Thinking EnabledTools
86.50
2026-03-15UnknownClosed
12
智谱AI
GLM-5
Thinking EnabledTools
86.40
2026-02-11744BFree Commercial
13
智谱AI
GLM-4.5-Air
Thinking EnabledTools
85.70
2025-07-28106BFree Commercial
14
阿里巴巴
Qwen3.5-122B-A10B
Thinking EnabledTools
85.50
2026-02-25122BFree Commercial
15
StepFunAI
Step 3.5 Flash
Thinking EnabledTools
85.30
2026-02-02196BFree Commercial
16
Google Deep Mind
Gemini 3.0 Flash
Thinking EnabledTools
85.20
2025-12-17UnknownClosed
17
Moonshot AI
Kimi K2.5
Thinking EnabledTools
84.80
2026-01-271000BFree Commercial
18
DeepSeek-AI
DeepSeek V3.2
Thinking EnabledTools
84.30
2025-12-01671BFree Commercial
19
MiniMaxAI
M2.1
Thinking EnabledTools
84.30
2025-12-23230BFree Commercial
20
xAI
Grok 4.1 Fast
Thinking EnabledTools
82.40
2025-11-19UnknownClosed
21
Anthropic
Haiku 4.5
Thinking EnabledTools
82.00
2025-10-15UnknownClosed
22
Anthropic
Claude Sonnet 4
Thinking EnabledTools
80.50
2025-05-23UnknownClosed
23
OpenAI
GPT-5-mini
Thinking EnabledTools
80.30
2025-08-07UnknownClosed
24
阿里巴巴
Qwen3-Max-Thinking
Thinking EnabledTools
80.30
2026-01-261000BClosed
25
阿里巴巴
Qwen3-Coder-Next
Thinking EnabledTools
79.10
2026-02-038BFree Commercial
26
阿里巴巴
Qwen3.5-35B-A3B
Thinking EnabledTools
78.40
2026-02-2535BFree Commercial
27
OpenAI
GPT-4o mini
Thinking EnabledTools
75.00
2024-07-18UnknownClosed
28
MistralAI
Mistral Large 3
Thinking EnabledTools
72.20
2025-12-02675BFree Commercial
29
Google Deep Mind
Gemini 2.5 Pro Experimental 03-25
Thinking EnabledTools
71.90
2025-03-25UnknownClosed
30
OpenAI
GPT-4o
Thinking EnabledTools
71.10
2024-05-13UnknownClosed
Scroll to load 7 more