Qwen3-VL-Embedding-2B

Name: Qwen3 Vision-Language Embedding 2B
Availability: InStock
Author: 阿里巴巴

embedding模型

Qwen3 Vision-Language Embedding 2B

Release date: 2026-01-08更新于: 2026-01-08 23:27:44674

Live demoGitHub Hugging Face Compare

Parameters

20.0亿

Context length

32K

Chinese support

Supported

Reasoning ability

Qwen3 Vision-Language Embedding 2B is an AI model published by 阿里巴巴, released on 2026-01-08, for embedding模型, with 20.0B parameters, and 32K tokens context length, requiring about 4.26GB storage, under the Apache 2.0 license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Qwen3-VL-Embedding-2B

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

32K tokens

Max output length

2048 tokens

Model type

Qwen3-VL-Embedding-2B

Open source & experience

Code license

Apache 2.0

Weights license

Apache 2.0- 免费商用授权

GitHub repo

https://github.com/QwenLM/Qwen3-VL-Embedding/

Hugging Face

https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B

Qwen3-VL-Embedding-2B

Official resources

Paper

Qwen3-VL-EmbeddingandQwen3-VL-Reranker:AUnifiedFrameworkforState-of-the-ArtMultimodalRetrievalandRanking

DataLearnerAI blog

重磅！阿里开源2个多模态向量大模型和重排序大模型：Qwen3-VL-Embedding和Qwen3-VL-Reranker，图片和视频也可以用来做RAG了！

Qwen3-VL-Embedding-2B

API details

API speed

3/5

No public API pricing yet.

Qwen3-VL-Embedding-2B

Benchmark Results

Qwen3-VL-Embedding-2B currently shows benchmark results led by MMEB-v2-Image (4 / 6, score 74.96). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.

图像向量嵌入

1 evaluations

Benchmark / mode

Score

Rank/total

MMEB-v2-Image

Off

74.96

4 / 6

View benchmark analysis Compare with other models

Qwen3-VL-Embedding-2B

Publisher

阿里巴巴

View publisher details

Qwen3 Vision-Language Embedding 2B

Model Overview

Qwen3-VL-Embedding-2B 是 Qwen 团队推出的 多模态向量表示模型，定位于检索系统和 RAG 系统中的第一阶段召回（Recall）。模型基于 Qwen3-VL 视觉语言架构构建，能够将文本、图片、截图（视觉文档）、视频等多种模态统一编码为稠密向量，用于相似度计算和大规模检索。

该模型在参数规模、性能与推理成本之间取得较好平衡，适合在大规模向量库、在线检索服务和资源受限环境中使用。

核心定位

用于多模态检索 / 多模态 RAG 的向量召回阶段
支持跨模态检索（文本搜图、文本搜视频、文本搜截图等）
面向高吞吐、低延迟、可规模化部署的生产场景

在典型系统中，Qwen3-VL-Embedding-2B 常作为默认向量模型，与多模态 Reranker 组成两阶段检索链路。

模型规格（官方公开信息整理）

项目说明

模型类型Multimodal Embedding（多模态向量模型）

参数规模2B

网络层数28 层

最大上下文长度32K tokens

向量维度2048（支持 MRL 动态裁剪）

输入模态文本 / 图片 / 截图 / 视频 / 混合模态

指令支持Instruction-aware（支持自定义任务指令）

多语言能力支持 30+ 种语言

量化支持支持低精度量化（如 int8）

许可证Apache 2.0（可商用）

模型特点

统一多模态表示空间不同模态的数据被映射到同一向量空间，可直接进行跨模态相似度计算。
支持长上下文输入 32K 上下文长度适合长文档、长截图序列和视频片段编码。
MRL（Matryoshka Representation Learning）支持在不重新编码的情况下截取不同维度的向量，用于在存储成本、检索速度和效果之间做权衡。
Instruction-aware 向量化可通过指令明确检索任务目标，使向量更贴近具体业务定义的“相关性”。

适用场景

多模态 RAG 的第一阶段召回
图片 / 视频 / 文档截图检索
企业知识库向量化
大规模在线搜索系统
对算力和延迟敏感的生产环境

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送