GTE-Large

Name: Gegeral Text Embeddings - Large
Author: 阿里巴巴

embedding模型

Gegeral Text Embeddings - Large

Release date: 2023-08-07更新于: 2023-08-08 16:20:53.274768

Live demoGitHubHugging Face Compare

Parameters

3.3亿

Context length

512

Chinese support

Not supported

Reasoning ability

Gegeral Text Embeddings - Large is an AI model published by 阿里巴巴, released on 2023-08-07, for embedding模型, with 3.3B parameters, and 512 tokens context length, requiring about 670MB storage, under the MIT License license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GTE-Large

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

512 tokens

Max output length

No data

Model type

embedding模型

Release date

2023-08-07

Model file size

670MB

MoE architecture

Total params / Active params

3.3B / N/A

Knowledge cutoff

No data

GTE-Large

Open source & experience

Code license

MIT License

Weights license

MIT License- 免费商用授权

GitHub repo

GitHub link unavailable

Hugging Face

https://huggingface.co/thenlper/gte-large

Live demo

No live demo

GTE-Large

Official resources

Paper

Towards General Text Embeddings with Multi-stage Contrastive Learning

DataLearnerAI blog

No blog post yet

GTE-Large

API details

API speed

No data

No public API pricing yet.

GTE-Large

Benchmark Results

No benchmark data to show.

GTE-Large

Publisher

阿里巴巴

View publisher details

Gegeral Text Embeddings - Large

Model Overview

GTE全称Gegeral Text Embeddings，是阿里巴巴提出的一种文本嵌入大模型。该模型开源版本包含3：分别是3.3亿参数的Large版本、1.1亿参数的Base版本和0.3亿参数的small版本。

GTE模型支持的输入序列长度维512，输出的embedding维度是1024，对于超过序列长度的输入将会截断。GTE模型完全开源，开源协议是MIT，可以商用。不过仅支持英文。

模型	参数量	预训练数据量	预训练方式	微调数据量	微调方式
GTE小型	约3000万	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调
GTE基准	约1.1亿	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调
GTE大型	约3.3亿	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

模型

参数量

预训练数据量

预训练方式

微调数据量

微调方式

GTE小型

约3000万

约80亿个文本对

无监督对比学习

约300万个文本三元组

多任务监督对比微调

GTE基准

约1.1亿

约80亿个文本对

无监督对比学习

约300万个文本三元组

多任务监督对比微调

GTE大型

约3.3亿

约80亿个文本对

无监督对比学习

约300万个文本三元组

多任务监督对比微调