加载中...
Grok-3 - Reasoning Beta currently shows benchmark results led by AIME 2024 (6 / 62, score 93.30), LiveCodeBench (24 / 108, score 79.40), GPQA Diamond (37 / 162, score 84.60).