Llama3.3-70B-Instruct Benchmark Details

Llama3.3-70B-Instruct currently shows benchmark results led by MBPP (3 / 28, score 87.60), MATH (13 / 42, score 77), HumanEval (14 / 39, score 88.40).

Benchmark Results

Llama3.3-70B-Instruct

Benchmark Results

General Knowledge

3 evaluations

Benchmark / mode

Score

Rank/total

MMLU

33 / 65

MMLU Pro

68.90

94 / 126

GPQA Diamond

50.50

152 / 179

Coding and Software Engineer

3 evaluations

Benchmark / mode

Score

Rank/total

HumanEval

88.40

14 / 39

MBPP

87.60

3 / 28

LiveCodeBench

33.30

109 / 120

Math and Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

MATH

13 / 42

Compare with other models