DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
Page navigation

Benchmark Results

Qwen3.7-Max-Preview

Benchmark Results

Thinking
Tool usage

综合评估

4 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Thinking Level · Max
92.40
10 / 177
MMLU Pro
Thinking Level · Max
89.60
4 / 126
HLE
Thinking ModeTools
53.50
8 / 154
HLE
Thinking Level · Max
41.40
44 / 154

编程与软件工程

4 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
Thinking Level · Max
91.60
4 / 120
SWE-bench Verified
Thinking ModeTools
80.40
9 / 105
SWE-bench Multilingual
Thinking ModeTools
78.30
3 / 20
SWE-Bench Pro - Public
Thinking ModeTools
60.60
3 / 40

指令跟随

1 evaluations
Benchmark / mode
Score
Rank/total
IF Bench
Thinking Level · Max
79.10
2 / 29

AI Agent - 工具使用

1 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
Thinking ModeTools
69.70
5 / 46

数学推理

1 evaluations
Benchmark / mode
Score
Rank/total
IMO-AnswerBench
Thinking Level · Max
90
1 / 19
Compare with other models

Competitor Comparison

Benchmark scores for Qwen3.7-Max-Preview compared against top models in its class

Qwen3.7-Max-PreviewKimi K2.6DeepSeek-V4-ProGLM 5.1
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. See the table below for per-mode details.

9 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

BenchmarkQwen3.7-Max-PreviewCurrentKimi K2.6DeepSeek-V4-ProGLM 5.1
GPQA Diamond
综合评估
92.40Thinking Level · High
90.50Thinking Enabled
90.10Thinking Level · High
86.20Thinking Enabled
HLE
综合评估
53.50Thinking Enabled | Tools
54.00Thinking Enabled | Tools
48.20Thinking Level · Extra High | Tools
52.30Thinking Enabled | Tools
MMLU Pro
综合评估
89.60Thinking Level · High
--
87.50Thinking Level · High
--
LiveCodeBench
编程与软件工程
91.60Thinking Level · High
89.60Thinking Enabled
93.50Thinking Level · High
--
SWE-bench Multilingual
编程与软件工程
78.30Thinking Enabled | Tools
76.70Thinking Enabled | Tools
76.20Thinking Level · Extra High | Tools
--
SWE-Bench Pro - Public
编程与软件工程
60.60Thinking Enabled | Tools
58.60Thinking Enabled | Tools
55.40Thinking Level · Extra High | Tools
58.40Thinking Enabled | Tools
SWE-bench Verified
编程与软件工程
80.40Thinking Enabled | Tools
80.20Thinking Enabled | Tools
80.60Thinking Level · Extra High | Tools
--
Terminal Bench 2.0
AI Agent - 工具使用
69.70Thinking Enabled | Tools
66.70Thinking Enabled | Tools
67.90Thinking Level · Extra High | Tools
63.50Thinking Enabled | Tools
IMO-AnswerBench
数学推理
90.00Thinking Level · High
86.00Thinking Enabled
89.80Thinking Level · High
83.80Thinking Enabled

Standard API Pricing: Qwen3.7-Max-Preview vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
Qwen3.7-Max-Preview
阿里巴巴$2.5 / 1M tokens$7.5 / 1M tokens—
Kimi K2.6
Facebook AI研究实验室$0.95 / 1M tokens$4 / 1M tokens—
DeepSeek-V4-Pro
DeepSeek-AI$1.74 / 1M tokens$3.48 / 1M tokens—
GLM 5.1
智谱AI$1.4 / 1M tokens$4.4 / 1M tokens—

Version History

How each version of the Qwen3.7-Max-Preview series stacks up on benchmark tests

Qwen3.7-Max-PreviewQwen3.6-Max-PreviewQwen3-Max-Thinking
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. See the table below for per-mode details.

10 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkQwen3.7-Max-PreviewCurrentQwen3.6-Max-PreviewQwen3-Max-Thinking
GPQA Diamond
综合评估
92.40Thinking Level · High
90.40Thinking Level · High
87.40Thinking Enabled
HLE
综合评估
53.50Thinking Enabled | Tools
50.20Thinking Enabled | Tools
49.80Thinking Enabled | Tools
MMLU Pro
综合评估
89.60Thinking Level · High
88.50Thinking Level · High
85.70Thinking Enabled
LiveCodeBench
编程与软件工程
91.60Thinking Level · High
87.10Thinking Level · High
85.90Thinking Enabled
SWE-bench Multilingual
编程与软件工程
78.30Thinking Enabled | Tools
73.80Thinking Enabled | Tools
--
SWE-Bench Pro - Public
编程与软件工程
60.60Thinking Enabled | Tools
56.60Thinking Enabled | Tools
--
SWE-bench Verified
编程与软件工程
80.40Thinking Enabled | Tools
78.80Thinking Enabled | Tools
75.30Thinking Enabled
IF Bench
指令跟随
79.10Thinking Level · High
74.20Thinking Level · High
70.90Thinking Enabled | Tools
Terminal Bench 2.0
AI Agent - 工具使用
69.70Thinking Enabled | Tools
65.40Deep Thinking Mode | Tools
--
IMO-AnswerBench
数学推理
90.00Thinking Level · High
83.80Thinking Level · High
83.90Thinking Enabled

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Qwen3.7-Max-Preview Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

Qwen3.6-Max-Preview: Base price applies to <= 128
Qwen3-Max-Thinking: Base price applies to <= 32K
ModelSupplierStandard inputStandard outputBase price applies to
Qwen3.7-Max-Preview
阿里巴巴$2.5 / 1M tokens$7.5 / 1M tokens—
Qwen3.6-Max-Preview
阿里巴巴$1.3 / 1M tokens$7.8 / 1M tokens<= 128
Qwen3-Max-Thinking
—1.2 美元/100万 tokens6 美元/100万 tokens<= 32K
Model catalogQwen3.7-Max-PreviewBenchmark analysis

Qwen3.7-Max-Preview Benchmark Details

Qwen3.7-Max-Preview currently shows benchmark results led by MMLU Pro (4 / 126, score 89.60), LiveCodeBench (4 / 120, score 91.60), HLE (8 / 154, score 53.50). This page also compares it with 3 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Sources

qwen.aiqwen.ai