加载中...
Llama-3.2-3B currently shows benchmark results led by GSM8K (21 / 24, score 34), BBH (16 / 18, score 46.80), GPQA Diamond (154 / 161, score 26.60).