加载中...

GLM-4.7-Flash

Name: GLM-4.7-Flash
Availability: InStock
Author: 智谱AI

Release date: 2026-01-19更新于: 2026-03-08 21:06:201,398

Live demo GitHub Hugging Face Compare

Parameters

310.0亿

Context length

200K

Chinese support

Supported

Reasoning ability

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GLM-4.7-Flash

Model basics

Reasoning traces

Supported

Thinking modes

Thinking Level · Off

Context length

200K tokens

Max output length

131072 tokens

Model type

推理大模型

Release date

2026-01-19

Model file size

62.5GB

MoE architecture

Yes

Total params / Active params

310.0B / 30B

Knowledge cutoff

No data

GLM-4.7-Flash

Open source & experience

Code license

MIT License

Weights license

MIT License- 免费商用授权

GitHub repo

https://github.com/zai-org/GLM-4

Hugging Face

https://huggingface.co/zai-org/GLM-4.7-Flash

Live demo

https://chat.z.ai

GLM-4.7-Flash

Official resources

Paper

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models (related technical report referenced on model card)

DataLearnerAI blog

No blog post yet

GLM-4.7-Flash

API details

API speed

3/5

💡Default unit: $/1M tokens. If vendors use other units, follow their published pricing.

Standard pricingStandard

Modality	Input	Output
Text	0 人民币	0 人民币

GLM-4.7-Flash

Benchmark Results

GLM-4.7-Flash currently shows benchmark results led by τ²-Bench - Telecom (11 / 33, score 96), AIME2025 (37 / 107, score 91.60), τ²-Bench (17 / 39, score 79.50). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.

Agent能力评测

3 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench - Telecom

OnWith tools

11 / 33

τ²-Bench

OnWith tools

79.50

17 / 39

Terminal Bench Hard

OnWith tools

8 / 12

指令跟随

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

OnWith tools

14 / 26

AI Agent - 信息收集

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

OnWith tools

42.80

27 / 32

View benchmark analysis Compare with other models

GLM-4.7-Flash

Publisher

智谱AI

View publisher details

GLM-4.7-Flash

Model Overview

GLM-4.7-Flash 是由 Z.ai（zai-org）发布并开源的大型语言模型，属于 GLM-4.7 系列。模型已在 Hugging Face 平台公开，提供完整权重与模型卡信息，面向文本生成类任务。

该模型主要支持中文与英文输入，适用于通用对话、代码生成、推理问答等典型大模型使用场景。

模型规模与架构信息

GLM-4.7-Flash 的参数规模约为 31B，采用 BF16 / FP32 精度格式发布。

从模型标注信息来看，其配置被归类为轻量级 MoE（Mixture-of-Experts）相关实现，定位于在算力与模型能力之间取得平衡。

在 GLM-4.7 系列中，Flash 版本与完整体量模型（数百 B 参数规模）形成区分，主要面向单机或中等规模集群的部署场景。

基准测试结果

根据模型卡中公开的评测数据，GLM-4.7-Flash 在多项常见基准测试中给出了量化结果，包括：

数学与逻辑推理类评测（如 AIME）
通用知识与问答类评测（如 GPQA）
代码生成与修复评测（如 LiveCodeBench、SWE-bench）
工具使用与多步推理相关评测（如 τ²-Bench、BrowseComp）

在这些测试中，GLM-4.7-Flash 与同参数量级模型（如 20B–30B 区间）存在数值差异，具体表现以官方公布的单项得分为准。

需要注意的是，上述结果均来自统一评测设置下的自动基准，未包含真实业务场景下的定制化测试。

推理与部署方式

GLM-4.7-Flash 支持多种主流推理框架和部署方式，包括：

使用 vLLM 进行高吞吐推理服务部署
使用 SGLang 启动模型服务
通过 Hugging Face Transformers 接口直接加载模型进行本地推理

模型卡中给出了对应的加载与推理示例，涵盖 tokenizer 初始化、模型加载与文本生成流程。

此外，该模型也可通过 Z.ai 提供的 API 平台进行调用，用于云端推理服务。

许可与使用限制

GLM-4.7-Flash 以 MIT License 形式发布。

该许可允许用户在遵循协议条款的前提下进行修改、分发和商业使用。

模型卡未额外标注使用领域限制或访问限制。

模型系列背景

GLM-4.7-Flash 隶属于 GLM 系列模型，该系列由 Z.ai 持续维护和发布，涵盖多种参数规模与使用场景的模型版本。

从公开信息来看，Flash 版本是 GLM-4.7 系列中针对部署效率和资源使用进行调整的一个分支。

社区讨论中对该模型的关注点主要集中在其参数规模、推理成本、基准分数以及与其他同级模型的对比。

总结

GLM-4.7-Flash 是一个约 31B 参数规模的开源语言模型，面向通用文本生成与推理任务。

模型提供了标准化的权重发布、基准评测数据与多种部署方式支持，并采用 MIT 许可。

对于需要在有限算力条件下使用 GLM-4.7 系列模型的场景，GLM-4.7-Flash 是当前公开版本之一，其具体适用性仍需结合实际任务与部署环境进行评估。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

加载中...

GLM-4.7-Flash

Release date: 2026-01-19更新于: 2026-03-08 21:06:201,398

Live demo GitHub Hugging Face Compare

Parameters

310.0亿

Context length

200K

Chinese support

Supported

Reasoning ability

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GLM-4.7-Flash

Model basics

Reasoning traces

Supported

Thinking modes

Thinking Level · Off

Context length

200K tokens

Max output length

131072 tokens

Model type

推理大模型

Release date

2026-01-19

Model file size

62.5GB

MoE architecture

Yes

Total params / Active params

310.0B / 30B

Knowledge cutoff

No data

GLM-4.7-Flash

Open source & experience

Code license

MIT License

Weights license

MIT License- 免费商用授权

GitHub repo

https://github.com/zai-org/GLM-4

Hugging Face

https://huggingface.co/zai-org/GLM-4.7-Flash

Live demo

https://chat.z.ai

GLM-4.7-Flash

Official resources

Paper

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models (related technical report referenced on model card)

DataLearnerAI blog

No blog post yet

GLM-4.7-Flash

API details

API speed

3/5

💡Default unit: $/1M tokens. If vendors use other units, follow their published pricing.

Standard pricingStandard

Modality	Input	Output
Text	0 人民币	0 人民币

GLM-4.7-Flash

Benchmark Results

Agent能力评测

3 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench - Telecom

OnWith tools

11 / 33

τ²-Bench

OnWith tools

79.50

17 / 39

Terminal Bench Hard

OnWith tools

8 / 12

指令跟随

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

OnWith tools

14 / 26

AI Agent - 信息收集

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

OnWith tools

42.80

27 / 32

View benchmark analysis Compare with other models

GLM-4.7-Flash

Publisher

智谱AI

View publisher details

GLM-4.7-Flash

Model Overview

该模型主要支持中文与英文输入，适用于通用对话、代码生成、推理问答等典型大模型使用场景。

模型规模与架构信息

在 GLM-4.7 系列中，Flash 版本与完整体量模型（数百 B 参数规模）形成区分，主要面向单机或中等规模集群的部署场景。

基准测试结果

根据模型卡中公开的评测数据，GLM-4.7-Flash 在多项常见基准测试中给出了量化结果，包括：

数学与逻辑推理类评测（如 AIME）
通用知识与问答类评测（如 GPQA）
代码生成与修复评测（如 LiveCodeBench、SWE-bench）
工具使用与多步推理相关评测（如 τ²-Bench、BrowseComp）

在这些测试中，GLM-4.7-Flash 与同参数量级模型（如 20B–30B 区间）存在数值差异，具体表现以官方公布的单项得分为准。

需要注意的是，上述结果均来自统一评测设置下的自动基准，未包含真实业务场景下的定制化测试。

推理与部署方式

GLM-4.7-Flash 支持多种主流推理框架和部署方式，包括：

使用 vLLM 进行高吞吐推理服务部署
使用 SGLang 启动模型服务
通过 Hugging Face Transformers 接口直接加载模型进行本地推理

模型卡中给出了对应的加载与推理示例，涵盖 tokenizer 初始化、模型加载与文本生成流程。

此外，该模型也可通过 Z.ai 提供的 API 平台进行调用，用于云端推理服务。

许可与使用限制

GLM-4.7-Flash 以 MIT License 形式发布。

该许可允许用户在遵循协议条款的前提下进行修改、分发和商业使用。

模型卡未额外标注使用领域限制或访问限制。

模型系列背景

社区讨论中对该模型的关注点主要集中在其参数规模、推理成本、基准分数以及与其他同级模型的对比。

总结

对于需要在有限算力条件下使用 GLM-4.7 系列模型的场景，GLM-4.7-Flash 是当前公开版本之一，其具体适用性仍需结合实际任务与部署环境进行评估。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送