GME-Qwen2-VL-2B

Name: gme-Qwen2-VL-2B
Availability: InStock
Author: 阿里巴巴

embedding模型

gme-Qwen2-VL-2B

Release date: 2024-12-24更新于: 2026-01-09 14:14:48200

Live demoGitHubHugging Face Compare

Parameters

20.0亿

Context length

32K

Chinese support

Not supported

Reasoning ability

gme-Qwen2-VL-2B is an AI model published by 阿里巴巴, released on 2024-12-24, for embedding模型, with 20.0B parameters, and 32K tokens context length, requiring about 8.85 GB storage, under the Apache 2.0 license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GME-Qwen2-VL-2B

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

32K tokens

Max output length

No data

Model type

embedding模型

Release date

2024-12-24

Model file size

8.85 GB

MoE architecture

Total params / Active params

20.0B / N/A

Knowledge cutoff

No data

GME-Qwen2-VL-2B

Open source & experience

Code license

Apache 2.0

Weights license

Apache 2.0- 免费商用授权

GitHub repo

GitHub link unavailable

Hugging Face

https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

Live demo

No live demo

GME-Qwen2-VL-2B

Official resources

Paper

GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

DataLearnerAI blog

No blog post yet

GME-Qwen2-VL-2B

API details

API speed

3/5

No public API pricing yet.

GME-Qwen2-VL-2B

Benchmark Results

GME-Qwen2-VL-2B currently shows benchmark results led by MMEB-v2-Image (6 / 6, score 51.89). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.

图像向量嵌入

1 evaluations

Benchmark / mode

Score

Rank/total

MMEB-v2-Image

Off

51.89

6 / 6

View benchmark analysis Compare with other models

GME-Qwen2-VL-2B

Publisher

阿里巴巴

View publisher details

gme-Qwen2-VL-2B

Model Overview

模型概述

GME（General Multimodal Embedding）是阿里巴巴通义实验室（Tongyi Lab）发布的统一多模态向量模型系列，基于 Qwen2-VL 骨干，用于 Universal Multimodal Retrieval（UMR）：把文本、图像、图文对编码为同一向量空间中的 embedding，服务于跨模态/同模态检索与排序。本条目对应论文中 2B 规模的 GME-Qwen2-VL-2B；开源权重以 Hugging Face 仓库 Alibaba-NLP/gme-Qwen2-VL-2B-Instruct 形式提供（仓库名带 Instruct）。

关键规格

项目	信息
发布方	阿里巴巴（Tongyi Lab）
模型规模	2B（模型卡中亦给出约 2.21B 的“Model Size”统计口径）
最大序列长度	32768（约 32K）
向量维度	1536
输入模态	text / image / text+image
输出	向量（embedding）
开源许可	Apache-2.0
模型文件体积	约 8.85 GB（主分支文件体积统计）

架构与训练要点（公开信息）

论文描述其以 Qwen2-VL 为骨干，采用 LoRA 方式进行检索向量化适配；并在训练中限制单张图像的视觉 token 上限为 1024，以兼顾训练效率与输入分辨率带来的 token 波动。论文同时报告构建了覆盖多种检索形态的数据（包含合成的 fused-modal 数据），整体训练数据规模达到百万级（论文报告约 8M 量级）。

能力与使用方式

模型卡给出三类 embedding：文本 embedding、图像 embedding、图文融合 embedding，并支持为 query 侧编码指定检索指令（instruction/prompt），用于对齐不同检索任务的“相关性”定义。模型卡同时给出 transformers 与 sentence-transformers 两套示例。

评测与基准（公开分数）

模型卡报告：在 UMRB（47 个子任务聚合）上，GME-Qwen2-VL-2B 的平均分为 64.45；并在 Model List 中同时列出 MTEB-en、MTEB-zh 等分数与模型维度/最大长度等信息。

已知限制（公开说明）

官方限制说明提到：评测与数据主要保留单图输入形态（multi-image / interleaved 形态未被系统评估）；训练与测试主要使用英语数据，多语言多模态 embedding 的效果不作保证。

访问方式

开源权重：Hugging Face（见上方仓库链接）。模型卡提示：远程代码在部分 transformers 版本存在兼容性问题，建议使用指定版本或 sentence-transformers。另：模型卡注明该系列也提供阿里云商业 API（multimodal-embedding-v1），但其后端模型与开源权重并非完全一致。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

项目

信息

发布方

阿里巴巴（Tongyi Lab）

模型规模

2B（模型卡中亦给出约 2.21B 的“Model Size”统计口径）

最大序列长度

32768（约 32K）

向量维度

1536

输入模态

text / image / text+image

输出

向量（embedding）

开源许可

Apache-2.0

模型文件体积

约 8.85 GB（主分支文件体积统计）