Llama3.1-8B Benchmark Details

Llama3.1-8B currently shows benchmark results led by BBH (16 / 21, score 57.70), GSM8K (21 / 26, score 55.30), MBPP (25 / 28, score 53.90).

Benchmark Results

Llama3.1-8B

Benchmark Results

General Knowledge

4 evaluations

Benchmark / mode

Score

Rank/total

MMLU

66.60

62 / 66

BBH

57.70

16 / 21

MMLU Pro

35.40

128 / 132

GPQA Diamond

25.80

182 / 187

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

GSM8K

55.30

21 / 26

MATH

20.50

40 / 42

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

MBPP

53.90

25 / 28

HumanEval

33.50

36 / 39

Common Sense Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

ARC

59.30

4 / 4

Compare with other models