DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款
Loading comparison...
Table of Contents
目录
  1. Home
  2. Model Compare
  3. Results

大模型评测对比结果

See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 3 个模型的评测数据与核心参数。

StepFun Flash 3.5Kimi K2.5Qwen3-Max-Thinking
规格对比
StepFunAI

StepFun Flash 3.5

ST

StepFun Flash 3.5

Release2026-02-02
Context length256K
Parameters1960
常规模式(Non-Thinking Mode)思考模式(Thinking Mode)
Model profile
Moonshot AI

Kimi K2.5

KI

Kimi K2.5

Release2026-01-27
Context length256K
Parameters10000
常规模式(Non-Thinking Mode)思考模式(Thinking Mode)
Model profilePlayground
阿里巴巴

Qwen3-Max-Thinking

QW

Qwen3-Max-Thinking

Release2026-01-26
Context length1000K
Parameters10000
常规模式(Non-Thinking Mode)思考模式(Thinking Mode)
Model profilePlayground

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

All Modes
Shortcuts
Thinking Mode (Default)
Thinking Mode (Default) - Help
  • Default: Thinking Mode (Default) (Standard/Medium)
  • All: Thinking Mode (All)
All Tools & Parallel

Best Overall

Qwen3-Max-Thinking · 81.80

Best Single

StepFun Flash 3.5 · AIME2025 99.80

Thinking Mode (Default)

StepFun Flash 3.5 · 2 All Modes

Benchmark scores

Higher is usually better; “—” means no score.

Filter: All Modes6 All Modes · 7 Benchmark
图表加载中...

Benchmark score table

Complete scores for each model/mode across selected benchmarks.

Benchmark scores

Higher is usually better; “—” means no score.

7 Benchmark6 All Modes
Supported modes:NormalThinkDeepToolParallel
Benchmark
ST
StepFun Flash 3.5StepFunAI
KI
Kimi K2.5Moonshot AI
QW
Qwen3-Max-Thinking阿里巴巴
编程与软件工程
LiveCodeBench
86.40—85.00—85.90—
SWE-bench Verified
74.40—76.80—75.30—
数学推理
AIME2025
97.3099.8096.10———
IMO-AnswerBench
85.4086.7081.80—83.90—
Agent能力评测
τ²-Bench
—88.20———82.10
AI Agent - 信息收集
BrowseComp
—69.0060.6074.90——
AI Agent - 工具使用
Terminal Bench 2.0
—51.0050.80———

Feature compare

Detailed feature breakdown

Licensing, MoE architecture, and multi-modality support.

Features & specs
ST
StepFun Flash 3.5StepFunAI
KI
Kimi K2.5Moonshot AI
QW
Qwen3-Max-Thinking阿里巴巴

Model snapshots

Organization
StepFunAIMoonshot AI阿里巴巴
模型全名
StepFun Flash 3.5Kimi K2.5Qwen3-Max-Thinking
模型简介
Not providedNot providedNot provided
模型类型
聊天大模型多模态大模型推理大模型
模型代号
stepfun-flash-3-5kimi-k2-5qwen3-max
Release
2026-02-022026-01-272026-01-26
MoE
YesYesYes

规格与性能

Context length
256K256K1000K
Parameters
19601000010000
激活参数量
110320Not provided
模型规模
100b100b100b
模型大小
38GB595GBNot provided
推理速度
推理等级
最大输出
163841638432768
Supported modes
常规模式(Non-Thinking Mode)思考模式(Thinking Mode)
常规模式(Non-Thinking Mode)思考模式(Thinking Mode)
常规模式(Non-Thinking Mode)思考模式(Thinking Mode)

开源与许可

Code Open Source
Not providedNot providedNot provided
Weights Open Source
Not providedNot providedNot provided
Commercial use
免费商用授权免费商用授权不开源

Modality support

Text Input/Output
/
/
/
Image Input/Output
/
/
/
Audio Input/Output
/
/
/
Video Input/Output
/
/
/
Embedding Input/Output
/
/
/

API 接口详情

Text 价格
Input: 0.0Output: 0.0Cache: 0.0Input (Extended): 0.0
Input: 0.6 美元/100 万tokensOutput: 3 美元/100 万tokensCache: 0.1 美元/100 万tokens
Input: 1.2 美元/100万 tokensOutput: 6 美元/100万 tokensInput (Extended): 2.4 美元/100万 tokensOutput (Extended): 12 美元/100万 tokensThreshold: 32K
Image API pricing
Not provided
Input: 0.6 美元/100 万tokensCache: 0.1 美元/100 万tokens
Not provided
Audio API pricing
Not providedNot providedNot provided
Video API pricing
Not providedNot providedNot provided
Embedding API pricing
Not providedNot providedNot provided

Resources

GitHub
RepoRepoNot provided
Hugging Face
Model PageModel PageNot provided
Official Page
Not providedNot providedNot provided
Guides
Not providedNot providedNot provided
Papers
Step 3.5 Flash: The Open Source 'Light Cavalry' for AgentsKimi K2.5: Visual Agentic Intelligence Qwen3-Max-Thinking: Pushing the Limits of Reasoning via Test-Time Scaling
DataLearnerAI
Not provided重磅!Kimi K2.5发布,依然免费开源!原生多模态MoE架构,全球最大规模参数的开源模型之一,官方评测结果比肩诸多闭源模型!可以驱动100个子Agent执行!Not provided

API pricing

API price comparison

Side-by-side input/output token pricing

Higher is usually better; “—” means no score.