Gemma 3 - 12B (IT)

Name: Gemma 3 - 12B (IT)
Author: Google Deep Mind

基础大模型

Gemma 3 - 12B (IT)

Release date: 2025-03-12更新于: 2025-03-12 22:14:181,503

Live demoGitHubHugging Face Compare

Parameters

120.0亿

Context length

128K

Chinese support

Supported

Reasoning ability

Gemma 3 - 12B (IT) is an AI model published by Google Deep Mind, released on 2025-03-12, for 基础大模型, with 120.0B parameters, and 128K tokens context length, requiring about 2GB storage, under the Gemma Terms of Use license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Gemma 3 - 12B (IT)

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

128K tokens

Max output length

No data

Model type

基础大模型

Release date

2025-03-12

Model file size

2GB

MoE architecture

Total params / Active params

120.0B / N/A

Knowledge cutoff

No data

Gemma 3 - 12B (IT)

Open source & experience

Code license

Gemma Terms of Use

Weights license

Gemma Terms of Use- 免费商用授权

GitHub repo

GitHub link unavailable

Hugging Face

https://huggingface.co/google/gemma-3-1b-it

Live demo

No live demo

Gemma 3 - 12B (IT)

Official resources

Paper

Gemma 3 Technical Report

DataLearnerAI blog

No blog post yet

Gemma 3 - 12B (IT)

API details

API speed

No data

No public API pricing yet.

Gemma 3 - 12B (IT)

Benchmark Results

Gemma 3 - 12B (IT) currently shows benchmark results led by MATH (9 / 42, score 83.80), MMLU Pro (96 / 116, score 60.60), GPQA Diamond (152 / 166, score 40.90). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.

综合评估

2 evaluations

Benchmark / mode

Score

Rank/total

MMLU Pro

Off

60.60

96 / 116

GPQA Diamond

Off

40.90

152 / 166

数学推理

1 evaluations

Benchmark / mode

Score

Rank/total

MATH

Off

83.80

9 / 42

常识问答

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

Off

6.30

44 / 45

编程与软件工程

1 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Off

24.60

109 / 109

View benchmark analysis Compare with other models

Gemma 3 - 12B (IT)

Publisher

Google Deep Mind

View publisher details

Gemma 3 - 12B (IT)

Model Overview

Gemma 3 - 12B(IT)是Google开源的120亿参数的第三代多模态大模型。IT后缀表明这是一个经过指令微调的版本，即insturction fine-tuned。

关于Gemma 3系列详细介绍参考： https://www.datalearner.com/blog/1051741769941194

Gemma3-12B 是 Google DeepMind 最新发布的 Gemma 3 系列模型之一，相较于 4B 版本，它在参数规模、计算能力和任务表现上进一步提升，同时保持了较好的计算效率。该模型支持 128K tokens 的长上下文处理，集成了 417M 参数的视觉编码器，并采用 知识蒸馏 进行优化，在文本生成、多模态任务和推理能力上展现出优异的性能。

模型架构与设计

解码器结构与注意力机制采用解码器 Transformer 架构，引入 Grouped-Query Attention (GQA)，结合 QK-norm 以优化注意力分布，提高计算稳定性。
局部与全局注意力层交替采用 5:1 的局部/全局注意力交替设计，减少 KV 缓存占用，使长文本推理更高效。
视觉模块内置 417M 参数的 SigLIP 视觉编码器，支持图像输入，可用于 OCR、图文对齐等任务。

训练细节

知识蒸馏：从更大模型（如 27B 版本）学习，提高文本理解和生成能力。
训练数据：使用 10T tokens 进行训练，包含大规模多语言文本和图像数据。
训练硬件：在 TPUv4 平台 上训练，采用 6144 个 TPU，16 数据切分、16 序列切分、24 副本。

参数配置

模型版本	视觉编码器参数	嵌入参数	非嵌入参数	上下文长度
Gemma3‑4B	417M	675M	3209M	128K tokens
Gemma3‑12B	417M	1012M	10759M	128K tokens
Gemma3‑27B	417M	1416M	25600M	128K tokens

模型特点与评测表现

多模态能力：内置视觉编码器，适用于图文任务。
长上下文处理：支持 128K tokens，适用于代码生成和复杂推理。
计算性能平衡：比 4B 更强，比 27B 计算要求更低，适合高性能需求但资源受限的场景。

总结

Gemma3-12B 在计算能力和任务表现上比 4B 版本更强，支持多模态输入，适合需要高效推理和长文本处理的任务，同时比 27B 版本更易部署。适用于 NLP、代码生成、OCR 以及多语言任务，是当前开源 LLM 生态中的重要选择。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

模型架构与设计

解码器结构与注意力机制采用解码器 Transformer 架构，引入 Grouped-Query Attention (GQA)，结合 QK-norm 以优化注意力分布，提高计算稳定性。

局部与全局注意力层交替采用 5:1 的局部/全局注意力交替设计，减少 KV 缓存占用，使长文本推理更高效。

视觉模块内置 417M 参数的 SigLIP 视觉编码器，支持图像输入，可用于 OCR、图文对齐等任务。

训练细节

知识蒸馏：从更大模型（如 27B 版本）学习，提高文本理解和生成能力。

训练数据：使用 10T tokens 进行训练，包含大规模多语言文本和图像数据。

训练硬件：在 TPUv4 平台 上训练，采用 6144 个 TPU，16 数据切分、16 序列切分、24 副本。

模型版本

视觉编码器参数

嵌入参数

非嵌入参数

上下文长度

Gemma3‑4B

417M

675M

3209M

128K tokens

Gemma3‑12B

417M

1012M

10759M

128K tokens

Gemma3‑27B

417M

1416M

25600M

128K tokens