GTE-Base

Name: Gegeral Text Embeddings - Base
Author: 阿里巴巴

基础大模型

Gegeral Text Embeddings - Base

Release date: 2023-08-07更新于: 2023-08-08 16:23:08.219656

Live demo

Parameters

1.1亿

Context length

512

Chinese support

Not supported

Reasoning ability

Gegeral Text Embeddings - Base is an AI model published by 阿里巴巴, released on 2023-08-07, for 基础大模型, with 1.1B parameters, and 512 tokens context length, requiring about 219MB storage, under the MIT License license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GTE-Base

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

512 tokens

Max output length

No data

Model type

基础大模型

Release date

2023-08-07

Model file size

219MB

MoE architecture

Total params / Active params

1.1B / N/A

Knowledge cutoff

No data

GTE-Base

Open source & experience

Code license

MIT License

Weights license

MIT License- 免费商用授权

GitHub repo

GitHub link unavailable

Hugging Face

https://huggingface.co/thenlper/gte-base

Live demo

No live demo

GTE-Base

Official resources

Paper

Towards General Text Embeddings with Multi-stage Contrastive Learning

DataLearnerAI blog

No blog post yet

GTE-Base

API details

API speed

No data

No public API pricing yet.

GTE-Base

Benchmark Results

No benchmark data to show.

GTE-Base

Publisher

阿里巴巴

View publisher details

Gegeral Text Embeddings - Base

Model Overview

GTE全称Gegeral Text Embeddings，是阿里巴巴提出的一种文本嵌入大模型。该模型开源版本包含3：分别是3.3亿参数的Large版本、1.1亿参数的Base版本和0.3亿参数的small版本。

GTE模型支持的输入序列长度维512，输出的embedding维度是1024，对于超过序列长度的输入将会截断。GTE模型完全开源，开源协议是MIT，可以商用。不过仅支持英文。

模型	参数量	预训练数据量	预训练方式	微调数据量	微调方式
GTE小型	约3000万	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调
GTE基准	约1.1亿	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调
GTE大型	约3.3亿	约80亿个文本对	无监督对比学习	约300万个文本三元组	多任务监督对比微调

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送