加载中...
GPT-4.1 currently shows benchmark results led by MMLU (8 / 64, score 90.20), GSM8K (5 / 26, score 95.90), MMLU Pro (46 / 115, score 80.50).