Qwen3.7-Max-Preview Benchmark Details

Qwen3.7-Max-Preview currently shows benchmark results led by MMLU Pro (4 / 126, score 89.60), LiveCodeBench (4 / 120, score 91.60), GPQA Diamond (11 / 179, score 92.40). This page also compares it with 3 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Qwen3.7-Max-Preview

Benchmark Results

General Knowledge

4 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

Thinking Level · Max

92.40

11 / 179

MMLU Pro

Thinking Level · Max

89.60

4 / 126

HLE

Thinking ModeTools

53.50

12 / 161

HLE

Thinking Level · Max

41.40

50 / 161

Coding and Software Engineer

4 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Thinking Level · Max

91.60

4 / 120

SWE-bench Verified

Thinking ModeTools

80.40

12 / 108

SWE-bench Multilingual

Thinking ModeTools

78.30

3 / 20

SWE-Bench Pro - Public

Thinking ModeTools

60.60

6 / 44

Instruction Following

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

Thinking Level · Max

79.10

2 / 29

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal Bench 2.0

Thinking ModeTools

69.70

5 / 46

Math and Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

IMO-AnswerBench

Thinking Level · Max

2 / 20

Compare with other models

Competitor Comparison

Benchmark scores for Qwen3.7-Max-Preview compared against top models in its class

Qwen3.7-Max-PreviewKimi K2.6 DeepSeek-V4-Pro GLM 5.1

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

9 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Qwen3.7-Max-PreviewCurrent	Kimi K2.6	DeepSeek-V4-Pro	GLM 5.1
GPQA Diamond 综合评估	92.40Thinking Level · High	--	90.10Thinking Level · High	86.20Thinking Enabled
HLE 综合评估	53.50Thinking Enabled ｜ Tools	54.00Thinking Enabled ｜ Tools	48.20Thinking Level · Extra High ｜ Tools	52.30Thinking Enabled ｜ Tools
MMLU Pro 综合评估	89.60Thinking Level · High	--	87.50Thinking Level · High	--
LiveCodeBench 编程与软件工程	91.60Thinking Level · High	--	93.50Thinking Level · High	--
SWE-bench Multilingual 编程与软件工程	78.30Thinking Enabled ｜ Tools	76.70Thinking Enabled ｜ Tools	76.20Thinking Level · Extra High ｜ Tools	--
SWE-Bench Pro - Public 编程与软件工程	60.60Thinking Enabled ｜ Tools	58.60Thinking Enabled ｜ Tools	55.40Thinking Level · Extra High ｜ Tools	58.40Thinking Enabled ｜ Tools
SWE-bench Verified 编程与软件工程	80.40Thinking Enabled ｜ Tools	80.20Thinking Enabled ｜ Tools	80.60Thinking Level · Extra High ｜ Tools	--
Terminal Bench 2.0 AI Agent - 工具使用	69.70Thinking Enabled ｜ Tools	66.70Thinking Enabled ｜ Tools	67.90Thinking Level · Extra High ｜ Tools	63.50Thinking Enabled ｜ Tools
IMO-AnswerBench 数学推理	90.00Thinking Level · High	--	89.80Thinking Level · High	83.80Thinking Enabled

Standard API Pricing: Qwen3.7-Max-Preview vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Qwen3.7-Max-Preview	阿里巴巴	$2.5 / 1M tokens	$7.5 / 1M tokens	—
Kimi K2.6	Facebook AI研究实验室	$0.95 / 1M tokens	$4 / 1M tokens	—
DeepSeek-V4-Pro	DeepSeek-AI	$0.435 / 1M tokens	$0.87 / 1M tokens	—
GLM 5.1	智谱AI	$1.4 / 1M tokens	$4.4 / 1M tokens	—

Version History

How each version of the Qwen3.7-Max-Preview series stacks up on benchmark tests

Qwen3.7-Max-PreviewQwen3.6-Max-Preview Qwen3-Max-Thinking

Benchmark categories:

10 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	Qwen3.7-Max-PreviewCurrent	Qwen3.6-Max-Preview	Qwen3-Max-Thinking
GPQA Diamond 综合评估	92.40Thinking Level · High	90.40Thinking Level · High	87.40Thinking Enabled
HLE 综合评估	53.50Thinking Enabled ｜ Tools	50.20Thinking Enabled ｜ Tools	49.80Thinking Enabled ｜ Tools
MMLU Pro 综合评估	89.60Thinking Level · High	88.50Thinking Level · High	85.70Thinking Enabled
LiveCodeBench 编程与软件工程	91.60Thinking Level · High	87.10Thinking Level · High	85.90Thinking Enabled
SWE-bench Multilingual 编程与软件工程	78.30Thinking Enabled ｜ Tools	73.80Thinking Enabled ｜ Tools	--
SWE-Bench Pro - Public 编程与软件工程	60.60Thinking Enabled ｜ Tools	56.60Thinking Enabled ｜ Tools	--
SWE-bench Verified 编程与软件工程	80.40Thinking Enabled ｜ Tools	78.80Thinking Enabled ｜ Tools	75.30Thinking Enabled
IF Bench 指令跟随	79.10Thinking Level · High	74.20Thinking Level · High	70.90Thinking Enabled ｜ Tools
Terminal Bench 2.0 AI Agent - 工具使用	69.70Thinking Enabled ｜ Tools	65.40Deep Thinking Mode ｜ Tools	--
IMO-AnswerBench 数学推理	90.00Thinking Level · High	83.80Thinking Level · High	83.90Thinking Enabled

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Qwen3.7-Max-Preview Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

Qwen3.6-Max-Preview: Base price applies to <= 128

Model	Supplier	Standard input	Standard output	Base price applies to
Qwen3.7-Max-Preview	阿里巴巴	$2.5 / 1M tokens	$7.5 / 1M tokens	—
Qwen3.6-Max-Preview	阿里巴巴	$1.3 / 1M tokens	$7.8 / 1M tokens	<= 128

Sources

qwen.aiqwen.ai