Qwen3-VL-4B-Instruct
Qwen3-VL-4B-Instruct is an AI model published by 阿里巴巴, released on 2025-10-15, for 多模态大模型, with 40.0B parameters, and 256K tokens context length, requiring about 8.89 GB storage, under the Apache 2.0 license.
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Qwen3-VL-4B-Instruct currently shows benchmark results led by DocVQA (3 / 5, score 95.30), MMMU (25 / 28, score 67.40). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.
Qwen3-VL 是阿里巴巴 Qwen 团队在 Qwen3 代系下推出的新一代视觉-语言模型,面向文本、图像与视频的联合理解与生成。该代系在长上下文、多模态融合与时空理解等方面进行了系统升级:模型原生支持 256K token 上下文,并可扩展至 1M;在视频理解中强调时间戳对齐,能够对长时序视频进行秒级片段定位;在跨模态对齐方面引入多层次视觉特征融合。
官方模型卡提供多模态与纯文本基准图表与使用样例;权重与推理代码可通过 Transformers/ModelScope 直接调用。
欢迎关注 DataLearner 官方微信,获得最新 AI 技术推送
