DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
Page navigation
Model catalogQwen3-235B-A22B-2507Benchmark analysis

Qwen3-235B-A22B-2507 Benchmark Details

Qwen3-235B-A22B-2507 currently shows benchmark results led by Creative Writing (3 / 23, score 87.50), SimpleQA (9 / 45, score 54.30), MMLU Pro (43 / 126, score 83). This page also tracks comparisons against 1 predecessor or same-series models. 1 source link is attached for reference.

Benchmark Results

Qwen3-235B-A22B-2507

Benchmark Results

Thinking

综合评估

5 evaluations
Benchmark / mode
Score
Rank/total
MMLU Pro
Standard Mode
83
43 / 126
GPQA Diamond
Standard Mode
77.50
84 / 177
LiveBench
Standard Mode
65.18
31 / 52
ARC-AGI
Standard Mode
11
58 / 65
ARC-AGI-2
Standard Mode
1.30
52 / 59

常识问答

1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQA
Standard Mode
54.30
9 / 45

编程与软件工程

1 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
Standard Mode
51.80
88 / 120

数学推理

1 evaluations
Benchmark / mode
Score
Rank/total
AIME2025
Standard Mode
70.30
72 / 106

写作和创作

1 evaluations
Benchmark / mode
Score
Rank/total
Creative Writing
Standard Mode
87.50
3 / 23
Compare with other models

Version History

How each version of the Qwen3-235B-A22B-2507 series stacks up on benchmark tests

Qwen3-235B-A22B-2507Qwen2.5-72B
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

2 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkQwen3-235B-A22B-2507CurrentQwen2.5-72B
GPQA Diamond
综合评估
77.50Standard Mode
45.90Standard Mode
MMLU Pro
综合评估
83.00Standard Mode
58.10Standard Mode

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the Qwen3-235B-A22B-2507 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.

Qwen3-235B-A22B-2507
Standard input: 0.7 美元/100 万tokens
Standard output: 2.8 美元/100 万tokens
ModelSupplierStandard inputStandard outputBase price applies to
Qwen3-235B-A22B-2507
—0.7 美元/100 万tokens2.8 美元/100 万tokens—

Sources

arcprize.orgarcprize.org