M2.1 Benchmark Details
M2.1 currently shows benchmark results led by MMLU Pro (7 / 126, score 88), SWE-bench Verified (32 / 105, score 74.80), GPQA Diamond (68 / 177, score 81).
Benchmark Results
M2.1
Benchmark Results
综合评估
3 evaluationsBenchmark / mode
Score
Rank/total
编程与软件工程
2 evaluationsBenchmark / mode
Score
Rank/total
Agent能力评测
2 evaluationsBenchmark / mode
Score
Rank/total