加载中...
加载中...
一个包含 8 个自然语言理解任务的基准,旨在评估模型在复杂的语言理解和推理任务上的性能。
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
暂无评测数据