GME-Qwen2-VL-7B

Name: gme-Qwen2-VL-7B
Availability: InStock
Author: 阿里巴巴

embedding模型

gme-Qwen2-VL-7B

Release date: 2024-12-24更新于: 2026-01-09 14:08:45188

Live demoGitHubHugging Face Compare

Parameters

70.0亿

Context length

32K

Chinese support

Not supported

Reasoning ability

gme-Qwen2-VL-7B is an AI model published by 阿里巴巴, released on 2024-12-24, for embedding模型, with 70.0B parameters, and 32K tokens context length, requiring about 33.2 GB storage, under the Apache 2.0 license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GME-Qwen2-VL-7B

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

32K tokens

Max output length

No data

Model type

embedding模型

Release date

2024-12-24

Model file size

33.2 GB

MoE architecture

Total params / Active params

70.0B / N/A

Knowledge cutoff

No data

GME-Qwen2-VL-7B

Open source & experience

Code license

Apache 2.0

Weights license

Apache 2.0- 免费商用授权

GitHub repo

GitHub link unavailable

Hugging Face

https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct

Live demo

No live demo

GME-Qwen2-VL-7B

Official resources

Paper

GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

DataLearnerAI blog

No blog post yet

GME-Qwen2-VL-7B

API details

API speed

3/5

No public API pricing yet.

GME-Qwen2-VL-7B

Benchmark Results

GME-Qwen2-VL-7B currently shows benchmark results led by MMEB-v2-Image (5 / 6, score 55.95). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.

图像向量嵌入

1 evaluations

Benchmark / mode

Score

Rank/total

MMEB-v2-Image

Off

55.95

5 / 6

View benchmark analysis Compare with other models

GME-Qwen2-VL-7B

Publisher

阿里巴巴

View publisher details

gme-Qwen2-VL-7B

Model Overview

模型概述

GME（General Multimodal Embedding）是阿里巴巴通义实验室（Tongyi Lab）发布的一组“统一多模态向量表示”模型，基于 Qwen2-VL 系列多模态大模型骨干。该系列面向 Universal Multimodal Retrieval（UMR）任务：将文本、图像、图文对映射到同一向量空间，用于任意模态之间的相似度检索（Any-to-Any）。本条目对应论文中 7B 规模的 GME-Qwen2-VL-7B；开源权重以 Hugging Face 仓库 Alibaba-NLP/gme-Qwen2-VL-7B-Instruct 形式提供（仓库名带 Instruct）。

关键规格

项目	信息
发布方	阿里巴巴（Tongyi Lab）
模型规模	7B（模型卡中亦给出约 8.29B 的“Model Size”统计口径）
最大序列长度	32768（约 32K）
向量维度	3584
输入模态	text / image / text+image
输出	向量（embedding）
开源许可	Apache-2.0
模型文件体积	约 33.2 GB（主分支文件体积统计）

架构与训练要点（公开信息）

论文描述其以 Qwen2-VL 为骨干，通过 LoRA 方式进行适配训练（例如 LoRA rank=8、temperature=0.03、学习率 1e-4 等设置）；为控制训练效率与视觉 token 数量，单张图像的视觉 token 上限设置为 1024。训练数据方面，论文报告构建并使用了大规模的多模态检索训练数据（包含合成的 fused-modal 数据），总规模达到百万级样本（论文报告约 8M 量级）。

能力与使用方式

模型卡给出典型调用：get_text_embeddings、get_image_embeddings、get_fused_embeddings，并支持通过 instruction/prompt 形式为“查询侧 embedding”注入检索意图（例如 Text-to-Image 检索提示词），用于区分 query/corpus 的编码方式。

评测与基准（公开分数）

模型卡在 UMRB 与 MTEB 等基准上报告了分数：在 UMRB（47 个子任务聚合）上，GME-Qwen2-VL-7B 的平均分为 67.44；在模型卡的 Model List 中同时给出 MTEB-en、MTEB-zh 分数与模型维度/最大长度等信息。

已知限制（公开说明）

官方在限制说明中提到：由于视觉 token 成本与数据覆盖限制，评测与数据主要保留“单图”输入形态；此外训练与测试主要使用英语数据，虽然骨干模型支持多语言，但多语言多模态 embedding 的效果不作保证。

访问方式

开源权重：Hugging Face（见上方仓库链接）。模型卡同时提示：远程代码在部分 transformers 版本（如 >=4.52.0）存在兼容性问题，建议使用指定版本或 sentence-transformers 路线。另：模型卡注明该系列也提供阿里云商业 API（multimodal-embedding-v1），但其后端模型与开源权重并非完全一致。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

项目

信息

发布方

阿里巴巴（Tongyi Lab）

模型规模

7B（模型卡中亦给出约 8.29B 的“Model Size”统计口径）

最大序列长度

32768（约 32K）

向量维度

3584

输入模态

text / image / text+image

输出

向量（embedding）

开源许可

Apache-2.0

模型文件体积

约 33.2 GB（主分支文件体积统计）