DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款
Page navigation
目录
Model catalogGemini 3.0 Pro (Preview 11-2025)Benchmark analysis
Google Deep Mind

Gemini 3.0 Pro (Preview 11-2025) Benchmark Analysis

Google Deep MindUpdated 2/22/202617 views

In-depth Analysis

谷歌发布的Gemini 3.0系列中最强的模型

Benchmark Results

Gemini 3.0 Pro (Preview 11-2025)

Benchmark Results

Tool usage

综合评估

13 evaluations
Benchmark / mode
Score
Rank/total
GPQA DiamondParallel thinking
93.80
2 / 153
GPQA DiamondThinking
91.90
5 / 153
GPQA DiamondThinking·High
91
7 / 153
MMLU ProThinking
90
2 / 112
ARC-AGIParallel thinking
87.50
5 / 42
ARC-AGIThinking
75
9 / 42
LiveBenchThinking
74.14
9 / 52
HLEThinking·High + With tools
45.80
11 / 105
ARC-AGI-2Parallel thinking
45.10
10 / 34
HLEParallel thinking
41
20 / 105
HLEThinking
37.50
24 / 105
HLEThinking·High
37.20
25 / 105
ARC-AGI-2Thinking
31.10
13 / 34

常识问答

1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQAThinking
72.10
5 / 44

编程与软件工程

2 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBenchThinking
92
1 / 103
SWE-bench VerifiedThinking
76.20
17 / 87

数学推理

4 evaluations
Benchmark / mode
Score
Rank/total
AIME2025Thinking
95
24 / 105
AIME 2026Thinking
90.60
7 / 7
FrontierMathThinking
38
2 / 52
FrontierMath - Tier 4Thinking
18.80
2 / 32

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple BenchThinking
76.40
1 / 27

Agent能力评测

4 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench - TelecomThinking·High + With tools
98
4 / 29
τ²-BenchThinking + With tools
85.40
6 / 34
Terminal Bench HardThinking·High + With tools
42
6 / 14
Terminal Bench HardThinking + With tools
39
7 / 14

指令跟随

2 evaluations
Benchmark / mode
Score
Rank/total
IF BenchThinking·High + With tools
70
6 / 25
IF BenchThinking
70
6 / 25

AI Agent - 信息收集

1 evaluations
Benchmark / mode
Score
Rank/total
BrowseCompThinking·High + With tools
59.20
15 / 27

AI Agent - 工具使用

2 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0Thinking·High + With tools
56.90
7 / 20
Terminal Bench 2.0Thinking + With tools
54.20
8 / 20

生产力知识

1 evaluations
Benchmark / mode
Score
Rank/total
GDPval-AAThinking·High
35
8 / 11

长上下文能力

1 evaluations
Benchmark / mode
Score
Rank/total
AA-LCRThinking·High
71
1 / 12
与其他模型对比

References

artificialanalysis.ai