服务器渲染的对比结果
模型: Qwen3-235B-A22B-Thinking, Qwen3-30B-A3B, Qwen3-32B, Qwen3-235B-A22B, Qwen3-30B-A3B-2507。基准数量: 4。
| 模型 | MMLU Pro | GPQA Diamond | AIME2025 | Creative Writing |
|---|---|---|---|---|
| Qwen3-235B-A22B-Thinking | 84.4 thinking | 81.1 thinking | 92.3 thinking | 86.1 thinking |
| Qwen3-30B-A3B | 69.1 normal | 54.8 normal | 21.6 normal | 68.1 normal |
| Qwen3-32B | - | 53.3 normal | 72.9 normal | - |
| Qwen3-235B-A22B | 72.9 normal | 71.1 normal | 24.7 normal | 80.4 normal |
| Qwen3-30B-A3B-2507 | 78.4 normal | 70.4 normal | 61.3 normal | 86 normal |