Qwen 3.6 Plus PreviewvsKimi K2.5

Across 11 shared benchmarks, Qwen 3.6 Plus Preview leads overall: Qwen 3.6 Plus Preview wins 11, Kimi K2.5 wins 0, with 0 ties and an average score difference of +3.90.

阿里巴巴
Qwen 3.6 Plus Preview

阿里巴巴 · 2026-03-31 · Chat model

Moonshot AI
Kimi K2.5

Moonshot AI · 2026-01-27 · Multimodal model

Qwen 3.6 Plus Preview11 wins(100%)(0%)0 winsKimi K2.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

Coding and Software Engineer

Qwen 3.6 Plus Preview 4/4
BenchmarkQwen 3.6 Plus PreviewKimi K2.5Diff
SWE-Bench Pro - Public56.6013 / 43Thinking (With Tools)50.7031 / 43Thinking (With Tools)+5.90
LiveCodeBench87.1010 / 120Thinking (No Tools)8516 / 120Thinking (No Tools)+2.10
SWE-bench Verified78.8020 / 108Thinking (With Tools)76.8027 / 108Thinking (With Tools)+2
SWE-bench Multilingual73.807 / 20Thinking (No Tools)7311 / 20Thinking (No Tools)+0.80

General Knowledge

Qwen 3.6 Plus Preview 3/3
BenchmarkQwen 3.6 Plus PreviewKimi K2.5Diff
MMLU Pro88.505 / 126Thinking (No Tools)78.5066 / 126Thinking (No Tools)+10
GPQA Diamond90.4017 / 178Thinking (No Tools)87.6034 / 178Thinking (No Tools)+2.80
HLE50.6017 / 157Thinking (With Tools)50.2020 / 157Thinking (With Tools)+0.40

Math and Reasoning

Qwen 3.6 Plus Preview 2/2
BenchmarkQwen 3.6 Plus PreviewKimi K2.5Diff
AIME 202695.302 / 14Thinking (No Tools)92.5010 / 14Thinking (No Tools)+2.80
IMO-AnswerBench83.8010 / 19Thinking (No Tools)81.8014 / 19Thinking (No Tools)+2

AI Agent - Tool Usage

Qwen 3.6 Plus Preview 1/1
BenchmarkQwen 3.6 Plus PreviewKimi K2.5Diff
Terminal Bench 2.061.6016 / 46Thinking (With Tools)50.8033 / 46Thinking (With Tools)+10.80

Long Context

Qwen 3.6 Plus Preview 1/1
BenchmarkQwen 3.6 Plus PreviewKimi K2.5Diff
AA-LCR68.306 / 13Thinking (No Tools)6510 / 13Thinking (No Tools)+3.30

Specs

FieldQwen 3.6 Plus PreviewKimi K2.5
Publisher阿里巴巴Moonshot AI
Release date2026-03-312026-01-27
Model typeChat modelMultimodal model
ArchitectureDenseMoE
ParametersNot available1T
Context length1M256K
Max output64K16K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemQwen 3.6 Plus PreviewKimi K2.5
Text input$0.5 / 1M tokensNot public
Text output$3 / 1M tokensNot public
Cache read$0.05 / 1M tokensNot public
Cache write$0.625 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • Qwen 3.6 Plus Previewleads in:Coding and Software Engineer (4/4), General Knowledge (3/3), Math and Reasoning (2/2), AI Agent - Tool Usage (1/1), Long Context (1/1)

On average across the 11 shared benchmarks, Qwen 3.6 Plus Preview scores 3.90 higher.

Largest single-benchmark gap: Terminal Bench 2.0 — Qwen 3.6 Plus Preview 61.60 vs Kimi K2.5 50.80 (+10.80).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.