服务器渲染的对比结果

Hunyuan-TurboS vs GPT-4o vs Llama3.1-405B Instruct vs Claude 3.5 Sonnet New vs DeepSeek-V3

模型: Hunyuan-TurboS, GPT-4o, Llama3.1-405B Instruct, Claude 3.5 Sonnet New, DeepSeek-V3。基准数量: 9。

Hunyuan-TurboSGPT-4oLlama3.1-405B InstructClaude 3.5 Sonnet NewDeepSeek-V3
模型 MMLUMMLU ProHumanEvalBBHGPQA DiamondLiveCodeBenchSimpleQAMATHMATH-500
Hunyuan-TurboS
89.5
normal
79
normal
91
normal
92.2
normal
57.5
normal
32
normal
22.8
normal
89.7
normal
-
GPT-4o
88.7
normal
77.9
normal
90
normal
91.7
normal
70.1
normal
35.1
normal
38.2
normal
75.9
normal
75.9
normal
Llama3.1-405B Instruct
88.6
normal
73.4
normal
89
normal
89.2
normal
49
normal
30.2
normal
17.1
normal
73.9
normal
-
Claude 3.5 Sonnet New
88.3
normal
78
normal
93.7
normal
92.6
normal
65
normal
38.7
normal
28.4
normal
78.3
normal
78
normal
DeepSeek-V3
88.5
normal
75.9
normal
89
normal
92.3
normal
59.1
normal
34.6
normal
24.9
normal
87.8
normal
87.8
normal