BBH

Updated Jul 18, 2026·1,532 views

Problem Count: 23
Institution: Google
Category: General Evaluation
Metrics: Varies
Language: English
Difficulty: Expert

Overview

A difficult subset of BIG-bench containing especially challenging tasks that test the limits of model capabilities.

Related resources

Latest BBH model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for BBH.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend

License:

Origin:

Model release cutoff:

Rank	Model				License
	ERNIE-4.5-300B-A47B Standard Mode	94.30	2025-06-30	300B	Free Commercial
	Claude 3.5 Sonnet New Standard Mode	92.60	2024-10-22	Unknown	Closed
	DeepSeek-V3 Standard Mode	92.30	2024-12-26	681B	Free Commercial
4	Hunyuan-TurboS Standard Mode	92.20	2025-03-10	Unknown	Closed
5	GPT-4o Standard Mode	91.70	2024-05-13	Unknown	Closed
6	Llama3.1-405B Instruct Standard Mode	89.20	2024-07-23	405B	Free Commercial
7	Hunyuan-A13B-Instruct Standard Mode	89.10	2025-06-27	80B	Free Commercial
8	Qwen3-235B-A22B Standard Mode	88.87	2025-04-28	235B	Free Commercial
9	Gemma 3 - 27B (IT) Standard Mode	87.60	2025-03-12	27B	Free Commercial
10	Qwen3-Next Standard Mode	87.13	2025-09-11	80B	Free Commercial
11	Qwen2.5-72B Standard Mode	86.30	2024-09-18	72.7B	Free Commercial
12	Gemma2-27B Standard Mode	74.90	2024-05-14	27B	Free Commercial
13	MiniCPM5-1B Thinking Enabled	71.89	2026-05-01	1.1B	Free Commercial
14	Gemma 2 - 9B Standard Mode	68.20	2024-06-27	9B	Free Commercial
15	Moonlight-16B-A3B-Instruct Standard Mode	65.20	2025-02-23	16B	Free Commercial
16	Llama3.1-8B Standard Mode	57.70	2024-07-23	8B	Free Commercial
17	Qwen2.5-3B Standard Mode	56.30	2024-09-18	3B	Free Commercial
18	Mistral-7B-Instruct-v0.3 Standard Mode	56.10	2024-05-22	7B	Free Commercial
19	Llama-3.2-3B Standard Mode	46.80	2024-09-18	3.2B	Free Commercial
20	Gemini 1.5 Pro Standard Mode	0.00	2024-02-15	Unknown	Closed
21	Amazon Nova Pro Standard Mode	0.00	2024-12-03	Unknown	Closed

Latest BBH model rankings and full benchmark leaderboard

BBH Rank