Qwen3.6-27BvsGPT-5.4 mini

Across 5 shared benchmarks, GPT-5.4 mini leads overall: Qwen3.6-27B wins 0, GPT-5.4 mini wins 5, with 0 ties and an average score difference of -4.44.

阿里巴巴
Qwen3.6-27B

阿里巴巴 · 2026-04-22 · Reasoning model

OpenAI
GPT-5.4 mini

OpenAI · 2026-03-17 · Reasoning model

Qwen3.6-27B0 wins(0%)(100%)5 winsGPT-5.4 mini

Benchmark scores

Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.

General Knowledge

GPT-5.4 mini 2/2
BenchmarkQwen3.6-27BGPT-5.4 miniDiff
HLE2492 / 157Thinking (No Tools)41.5046 / 157极高强度思考(工具)-17.50
GPQA Diamond87.8033 / 178Thinking (No Tools)8832 / 178极高强度思考(无工具)-0.20

AI Agent - Tool Usage

GPT-5.4 mini 1/1
BenchmarkQwen3.6-27BGPT-5.4 miniDiff
Terminal Bench 2.059.3020 / 46Thinking (With Tools)6019 / 46极高强度思考(工具)-0.70

Claw-style Agent Evaluation

GPT-5.4 mini 1/1
BenchmarkQwen3.6-27BGPT-5.4 miniDiff
Claw Bench72.4027 / 29Thinking (With Tools)75.3025 / 29Thinking (With Tools)-2.90

Coding and Software Engineer

GPT-5.4 mini 1/1
BenchmarkQwen3.6-27BGPT-5.4 miniDiff
SWE-Bench Pro - Public53.5024 / 43Thinking (With Tools)54.4021 / 43极高强度思考(工具)-0.90

Specs

FieldQwen3.6-27BGPT-5.4 mini
Publisher阿里巴巴OpenAI
Release date2026-04-222026-03-17
Model typeReasoning modelReasoning model
ArchitectureDenseDense
Parameters27BNot available
Context length128K400K
Max output16K128K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemQwen3.6-27BGPT-5.4 mini
Text inputNot public$0.75 / 1M tokens
Text outputNot public$4.5 / 1M tokens
Cache readNot public$4.5 / 1M tokens
Cache writeNot public$0.075 / 1M tokens

One or both models have incomplete public pricing.

Summary

  • GPT-5.4 minileads in:General Knowledge (2/2), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Coding and Software Engineer (1/1)

On average across the 5 shared benchmarks, GPT-5.4 mini scores 4.44 higher.

Largest single-benchmark gap: HLE — Qwen3.6-27B 24 vs GPT-5.4 mini 41.50 (-17.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.