Llama-3.2-3B Benchmark Details

Llama-3.2-3B currently shows benchmark results led by GSM8K (23 / 26, score 34), BBH (18 / 20, score 46.80), GPQA Diamond (172 / 179, score 26.60).

Benchmark Results

Llama-3.2-3B

Benchmark Results

Thinking

General Knowledge

4 evaluations
Benchmark / mode
Score
Rank/total
54.75
65 / 65
46.80
18 / 20
26.60
172 / 179
25
125 / 126

Math and Reasoning

2 evaluations
Benchmark / mode
Score
Rank/total
34
23 / 26
8.50
42 / 42

Coding and Software Engineer

2 evaluations
Benchmark / mode
Score
Rank/total
48.70
27 / 28
28
38 / 39