DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Home
  2. /
  3. Benchmarks
  4. /
  5. SWE-bench Verified

SWE-bench Verified

Updated Apr 28, 2026·14,351 views
Current SOTA
Anthropic
Claude Mythos Preview
Anthropic
93.90Score
Problem Count
500
Institution
OpenAI
Category
Coding and Software Engineering
Metrics
Accuracy
Language
English
Difficulty
Mixed

Overview

SWE-bench Verified is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.

Related resources

  • View Paper
  • Get Dataset
  • Official Website
  • DataLearner Blog

Latest SWE-bench Verified model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for SWE-bench Verified.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend
License:
Origin:
Model release cutoff:

3 parallel-mode results hidden

SWE-bench Verified Rank

RankModelLicense
Anthropic
Claude Mythos Preview
Extended ThinkingTools
93.90
2026-04-07UnknownClosed
Anthropic
Opus 4.7
Extended ThinkingTools
87.60
2026-04-16UnknownClosed
Anthropic
Opus 4.5
Extended ThinkingTools
80.90
2025-11-25UnknownClosed
4
Anthropic
Claude Opus 4.6
Extended ThinkingTools
80.84
2026-02-05UnknownClosed
5
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking Level · HighTools
80.60
2026-02-20UnknownClosed
6
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · Extra HighTools
80.60
2026-04-241600BFree Commercial
7
MiniMaxAI
MiniMax M2.5
Thinking EnabledTools
80.20
2026-02-12229BFree Commercial
8
Moonshot AI
Kimi K2.6
Thinking EnabledTools
80.20
2026-04-201000BFree Commercial
9
OpenAI
GPT-5.2
Thinking Level · Extra HighTools
80.00
2025-12-11UnknownClosed
10
Anthropic
Claude Sonnet 4.6
Thinking Enabled
79.60
2026-02-17UnknownClosed
11
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · HighTools
79.40
2026-04-241600BFree Commercial
12
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · Extra HighTools
79.00
2026-04-24284BFree Commercial
13
阿里巴巴
Qwen 3.6 Plus Preview
Thinking EnabledTools
78.80
2026-03-31UnknownClosed
14
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · HighTools
78.60
2026-04-24284BFree Commercial
15
智谱AI
GLM-5
Thinking Enabled
77.80
2026-02-11744BFree Commercial
16
Facebook AI研究实验室
Muse Spark
Thinking EnabledTools
77.40
2026-04-08UnknownClosed
17
Anthropic
Claude Sonnet 4.5
Thinking EnabledTools
77.20
2025-09-30UnknownClosed
18
阿里巴巴
Qwen3.6-27B
Thinking EnabledTools
77.20
2026-04-2227BFree Commercial
19
OpenAI
GPT-5.1-Codex-Max
Thinking Level · HighTools
76.80
2025-11-19UnknownClosed
20
Moonshot AI
Kimi K2.5
Thinking EnabledTools
76.80
2026-01-271000BFree Commercial
21
阿里巴巴
Qwen3.5-397B-A17B
Thinking EnabledTools
76.40
2026-02-1639.7BFree Commercial
22
OpenAI
GPT-5.1
Thinking Level · High
76.30
2025-11-12UnknownClosed
23
OpenAI
GPT-5.1
Thinking Level · HighTools
76.30
2025-11-12UnknownClosed
24
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
76.20
2025-11-18UnknownClosed
25
阿里巴巴
Qwen3-Max-Thinking
Thinking Enabled
75.30
2026-01-261000BClosed
26
OpenAI
o3-pro
Thinking Level · High
75.00
2025-06-10UnknownClosed
27
MiniMaxAI
M2.1
Thinking Enabled
74.80
2025-12-23230BFree Commercial
Scroll to load 73 more