GTE-Small

Name: Gegeral Text Embeddings - Small
Author: 阿里巴巴

embedding模型

Gegeral Text Embeddings - Small

Release date: 2023-08-07更新于: 2023-08-08 16:22:51.991693

Live demo

Parameters

0.3亿

Context length

512

Chinese support

Not supported

Reasoning ability

Gegeral Text Embeddings - Small is an AI model published by 阿里巴巴, released on 2023-08-07, for embedding模型, with 0.3B parameters, and 512 tokens context length, requiring about 66.8MB storage, under the MIT License license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GTE-Small

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

512 tokens

Max output length

No data

Model type

embedding模型

Release date

2023-08-07

Model file size

66.8MB

MoE architecture

Total params / Active params

0.3B / N/A

Knowledge cutoff

No data

GTE-Small

Open source & experience

Code license

MIT License

Weights license

MIT License- 免费商用授权

GitHub repo

GitHub link unavailable

Hugging Face

https://huggingface.co/thenlper/gte-small

Live demo

No live demo

GTE-Small

Official resources

Paper

Towards General Text Embeddings with Multi-stage Contrastive Learning

DataLearnerAI blog

No blog post yet

GTE-Small

API details

API speed

No data

No public API pricing yet.

GTE-Small

Benchmark Results

No benchmark data to show.

GTE-Small

Publisher

阿里巴巴

View publisher details

Gegeral Text Embeddings - Small

Model Overview

GTE全称Gegeral Text Embeddings，是阿里巴巴提出的一种文本嵌入大模型。该模型开源版本包含3：分别是3.3亿参数的Large版本、1.1亿参数的Base版本和0.3亿参数的small版本。

GTE模型支持的输入序列长度维512，输出的embedding维度是1024，对于超过序列长度的输入将会截断。GTE模型完全开源，开源协议是MIT，可以商用。不过仅支持英文。

模型	参数量	预训练数据量	预训练方式	微调数据量	微调方式
GTE小型	约3000万	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调
GTE基准	约1.1亿	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调
GTE大型	约3.3亿	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送