DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
目录
Model catalogGTE-Large
GT

GTE-Large

embedding模型

Gegeral Text Embeddings - Large

Release date: 2023-08-07更新于: 2023-08-08 16:20:53.274768
Live demoGitHubHugging FaceCompare
Parameters
3.3亿
Context length
512
Chinese support
Not supported
Reasoning ability

Gegeral Text Embeddings - Large is an AI model published by 阿里巴巴, released on 2023-08-07, for embedding模型, with 3.3B parameters, and 512 tokens context length, requiring about 670MB storage, under the MIT License license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

GTE-Large

Model basics

Reasoning traces
Not supported
Thinking modes
Thinking modes not supported
Context length
512 tokens
Max output length
No data
Model type
embedding模型
Release date
2023-08-07
Model file size
670MB
MoE architecture
No
Total params / Active params
3.3B / N/A
Knowledge cutoff
No data
GTE-Large

Open source & experience

Code license
MIT License
Weights license
MIT License- 免费商用授权
GitHub repo
GitHub link unavailable
Hugging Face
https://huggingface.co/thenlper/gte-large
Live demo
No live demo
GTE-Large

Official resources

Paper
Towards General Text Embeddings with Multi-stage Contrastive Learning
DataLearnerAI blog
No blog post yet
GTE-Large

API details

API speed
No data
No public API pricing yet.
GTE-Large

Benchmark Results

No benchmark data to show.
GTE-Large

Publisher

阿里巴巴
阿里巴巴
View publisher details
Gegeral Text Embeddings - Large

Model Overview

GTE全称Gegeral Text Embeddings,是阿里巴巴提出的一种文本嵌入大模型。该模型开源版本包含3:分别是3.3亿参数的Large版本、1.1亿参数的Base版本和0.3亿参数的small版本。


GTE模型支持的输入序列长度维512,输出的embedding维度是1024,对于超过序列长度的输入将会截断。GTE模型完全开源,开源协议是MIT,可以商用。不过仅支持英文。


模型参数量预训练数据量预训练方式微调数据量微调方式
GTE小型约3000万约80亿个文本对无监督对比学习约300万个文本三元组多任务监督对比微调
GTE基准约1.1亿约80亿个文本对无监督对比学习约300万个文本三元组多任务监督对比微调
GTE大型约3.3亿约80亿个文本对无监督对比学习约300万个文本三元组多任务监督对比微调


DataLearner 官方微信

欢迎关注 DataLearner 官方微信,获得最新 AI 技术推送

DataLearner 官方微信二维码