DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Back to benchmark list
  2. /
  3. Simple Bench

Simple Bench

随着大型语言模型(LLM)的飞速发展,如何准确、全面地评估它们的能力成为了一个日益重要的课题。在众多评测基准中,Simple Bench 以其独特的定位脱颖而出,它专注于检验模型在日常人类推理方面的能力,而在这些方面,当前最先进的模型往往还不如普通人。本文将详细介绍 Simple Bench 评测基准,探讨其出现的背景、设计理念、评测流程以及当前主流模型的表现。

更新于 2026-04-19
1,084 次浏览
问题数量
200
发布机构
个人
评测类别
常识推理
评测指标
Accuracy
支持语言
英文
难度等级
高难度

简介

一个用于评估大模型常识水平的评测基准

相关资源

查看原始论文
阅读学术论文原文
获取数据集
下载评测数据集
访问官网
浏览项目官方网站
DataLearner 介绍
中文详细解读

Simple Bench Model Score Leaderboard

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

模式说明:
normal
thinking
low
medium
high
deeper thinking
parallel_thinking
图表加载中...

Latest Simple Bench model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for Simple Bench.

Model release cutoff:

Simple Bench详细排名数据表格

排名模型
1
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
76.402025-11-18未知
2
Gemini 2.5-Pro
Thinking Enabled
62.402025-06-05未知
3
Opus 4.5
Extended Thinking
62.002025-11-25未知
4
GPT-5-Pro
Thinking Enabled
61.602025-08-07未知
5
Grok 4
Thinking Enabled
60.502025-07-10未知
6
Opus 4.1
Extended Thinking
60.002025-08-06未知
7
Claude Opus 4
Thinking Enabled
58.802025-05-23未知
8
GPT-5
Thinking Level · High
56.702025-08-07未知
9
Claude Sonnet 4.5
Standard Mode
54.302025-09-30未知
10
GPT-5.1
Thinking Level · High
53.202025-11-12未知
11
OpenAI o3
Thinking Level · High
53.102025-04-16未知
12
GLM-4.7
Thinking Enabled
47.702025-12-223580
13
Kimi K2.5
Thinking Enabled
46.802026-01-2710000
14
Claude Sonnet 3.7
Thinking Enabled
46.402025-02-25未知
15
Claude Sonnet 4
Thinking Enabled
45.502025-05-23未知
16
Claude Sonnet 3.7
Standard Mode
44.902025-02-25未知
17
DeepSeek-R1-0528
Thinking Enabled
40.802025-05-286710
18
OpenAI o1
Thinking Level · High
40.102024-12-05未知
19
OpenAI o4 - mini
Thinking Enabled
38.702025-04-16未知
20
GPT-4.5
Standard Mode
34.502025-02-28未知
21
Qwen3-235B-A22B
Thinking Enabled
31.002025-04-282350
22
DeepSeek-V3-0324
Standard Mode
27.202025-03-246710
23
GPT-4.1
Standard Mode
27.002025-04-14未知
24
Kimi K2
Standard Mode
26.302025-07-1110000
25
OpenAI o3-mini
Thinking Enabled
22.802025-01-31未知
26
GPT OSS 120B
Thinking Enabled
22.102025-08-06117
27
DeepSeek-V3
Standard Mode
18.902024-12-266810