GPQA

Updated Jul 9, 2026·2,306 views

Problem Count: 448
Institution: CohereAI
Category: General Evaluation
Metrics: Accuracy
Language: English
Difficulty: Mixed

Overview

A graduate-level, Google-proof question-answering benchmark designed to evaluate expert knowledge and rigorous reasoning.

Related resources

Latest GPQA model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for GPQA.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend

License:

Origin:

Model release cutoff:

Rank	Model				License
	Qwen3.6-35B-A3B Thinking Enabled	86.00	2026-04-16	35B	Free Commercial
	GPT-Live-1 Thinking Level · High	84.20	2026-07-08	Unknown	Closed
	DeepSeek-V3-0324 Standard Mode	68.40	2025-03-24	671B	Free Commercial
4	Pangu Embedded Standard Mode	68.00	2025-06-30	7B	Free Commercial
5	Qwen3-8B Standard Mode	62.00	2025-04-28	8B	Free Commercial
6	DeepSeek-V3 Standard Mode	59.10	2024-12-26	681B	Free Commercial
7	GLM-4-9B-Chat Standard Mode	58.50	2024-06-05	9B	Free Commercial
8	Hunyuan-A13B-Instruct Standard Mode	49.12	2025-06-27	80B	Free Commercial
9	Mistral-Small-3.1-24B-Instruct-2503 Standard Mode	44.42	2025-03-17	24B	Free Commercial
10	Mistral-Small-3.2 Standard Mode	44.22	2025-06-20	24B	Free Commercial
11	Qwen3-Next Standard Mode	43.43	2025-09-11	80B	Free Commercial
12	GPT-4o mini Standard Mode	40.20	2024-07-18	Unknown	Closed
13	Claude 3.5 Haiku Standard Mode	37.50	2024-10-22	Unknown	Closed
14	Gemma 3 - 27B (IT) Standard Mode	36.83	2025-03-12	27B	Free Commercial
15	C4AI Aya Vision 32B Standard Mode	34.38	2025-03-04	32B	Non-Commercial

Latest GPQA model rankings and full benchmark leaderboard

GPQA Rank