服务器渲染的对比结果

GPT OSS 20B vs Kimi K2 vs Qwen3-235B-A22B-Thinking vs GPT OSS 120B

模型: GPT OSS 20B, Kimi K2, Qwen3-235B-A22B-Thinking, GPT OSS 120B。基准数量: 5。

GPT OSS 20BKimi K2Qwen3-235B-A22B-ThinkingGPT OSS 120B
模型 MMLUGPQA DiamondAIME 2024AIME2025HLE
GPT OSS 20B
85.3
thinking
71.5
thinking
-
79
thinking
10.9
thinking
Kimi K2
89.5
normal
75.1
normal
69.6
normal
54
normal
4.7
normal
Qwen3-235B-A22B-Thinking-
81.1
thinking
-
92.3
thinking
18.2
thinking
GPT OSS 120B
90
thinking
80.1
thinking
-
83
thinking
14.9
thinking