Llama3.1-8B-Instruct Benchmark Details

Llama3.1-8B-Instruct currently shows benchmark results led by GSM8K (16 / 26, score 82.40), MBPP (18 / 28, score 69.40), HumanEval (28 / 39, score 66.50).

Benchmark Results

Llama3.1-8B-Instruct

Benchmark Results

General Knowledge

3 evaluations

Benchmark / mode

Score

Rank/total

MMLU

68.10

60 / 66

MMLU Pro

126 / 132

GPQA Diamond

26.30

180 / 187

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

GSM8K

82.40

16 / 26

MATH

47.60

35 / 42

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

MBPP

69.40

18 / 28

HumanEval

66.50

28 / 39

Compare with other models