SWE-bench Verified
Updated Apr 28, 2026·14,351 views
- Problem Count
- 500
- Institution
- OpenAI
- Category
- Coding and Software Engineering
- Metrics
- Accuracy
- Language
- English
- Difficulty
- Mixed
Overview
SWE-bench Verified is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.
Related resources
Latest SWE-bench Verified model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for SWE-bench Verified.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
License:
Origin:
Model release cutoff:
3 parallel-mode results hidden
SWE-bench Verified Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() Claude Mythos Preview Extended ThinkingTools | 93.90 | 2026-04-07 | Unknown | Closed | |
![]() Opus 4.7 Extended ThinkingTools | 87.60 | 2026-04-16 | Unknown | Closed | |
![]() Opus 4.5 Extended ThinkingTools | 80.90 | 2025-11-25 | Unknown | Closed | |
4 | ![]() Claude Opus 4.6 Extended ThinkingTools | 80.84 | 2026-02-05 | Unknown | Closed |
5 | ![]() Gemini 3.1 Pro Preview Thinking Level · HighTools | 80.60 | 2026-02-20 | Unknown | Closed |
6 | ![]() DeepSeek-V4-Pro Thinking Level · Extra HighTools | 80.60 | 2026-04-24 | 1600B | Free Commercial |
7 | ![]() MiniMax M2.5 Thinking EnabledTools | 80.20 | 2026-02-12 | 229B | Free Commercial |
8 | ![]() Kimi K2.6 Thinking EnabledTools | 80.20 | 2026-04-20 | 1000B | Free Commercial |
9 | ![]() GPT-5.2 Thinking Level · Extra HighTools | 80.00 | 2025-12-11 | Unknown | Closed |
10 | ![]() Claude Sonnet 4.6 Thinking Enabled | 79.60 | 2026-02-17 | Unknown | Closed |
11 | ![]() DeepSeek-V4-Pro Thinking Level · HighTools | 79.40 | 2026-04-24 | 1600B | Free Commercial |
12 | ![]() DeepSeek-V4-Flash Thinking Level · Extra HighTools | 79.00 | 2026-04-24 | 284B | Free Commercial |
13 | ![]() Qwen 3.6 Plus Preview Thinking EnabledTools | 78.80 | 2026-03-31 | Unknown | Closed |
14 | ![]() DeepSeek-V4-Flash Thinking Level · HighTools | 78.60 | 2026-04-24 | 284B | Free Commercial |
15 | ![]() GLM-5 Thinking Enabled | 77.80 | 2026-02-11 | 744B | Free Commercial |
16 | ![]() Muse Spark Thinking EnabledTools | 77.40 | 2026-04-08 | Unknown | Closed |
17 | ![]() Claude Sonnet 4.5 Thinking EnabledTools | 77.20 | 2025-09-30 | Unknown | Closed |
18 | ![]() Qwen3.6-27B Thinking EnabledTools | 77.20 | 2026-04-22 | 27B | Free Commercial |
19 | ![]() GPT-5.1-Codex-Max Thinking Level · HighTools | 76.80 | 2025-11-19 | Unknown | Closed |
20 | ![]() Kimi K2.5 Thinking EnabledTools | 76.80 | 2026-01-27 | 1000B | Free Commercial |
21 | ![]() Qwen3.5-397B-A17B Thinking EnabledTools | 76.40 | 2026-02-16 | 39.7B | Free Commercial |
22 | ![]() GPT-5.1 Thinking Level · High | 76.30 | 2025-11-12 | Unknown | Closed |
23 | ![]() GPT-5.1 Thinking Level · HighTools | 76.30 | 2025-11-12 | Unknown | Closed |
24 | ![]() Gemini 3.0 Pro (Preview 11-2025) Thinking Enabled | 76.20 | 2025-11-18 | Unknown | Closed |
25 | ![]() Qwen3-Max-Thinking Thinking Enabled | 75.30 | 2026-01-26 | 1000B | Closed |
26 | ![]() o3-pro Thinking Level · High | 75.00 | 2025-06-10 | Unknown | Closed |
27 | ![]() M2.1 Thinking Enabled | 74.80 | 2025-12-23 | 230B | Free Commercial |
Scroll to load 73 more








