Text-to-Image Arena Leaderboard
The latest AI text-to-image model leaderboard based on Text-to-Image Arena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for GPT-Image, FLUX, Midjourney, DALL-E, and more.
Top Model
Seedream 4 2K
Top Score
1,140
Model Count
63
Data version
2026年05月22日
Data source: LM Arena
About This Leaderboard
This leaderboard ranks AI text-to-image models by generation quality. Data comes from LMArena's Text-to-Image Arena track, evaluated through anonymous blind testing by real users.
Methodology Overview
Blind testing: Users submit text prompts, two anonymous models generate images, and users vote for the better result.
Elo scoring: Based on the Bradley-Terry model, scientifically measuring each model's relative strength in text-to-image generation.
Diverse generation scenarios: Covers photorealistic scenes, artistic illustration, product design, character creation, and more.
DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
| 23 | Seedream 4 2KBytedance | 1,140 | +/-7 | 12,626 | Bytedance | Proprietary |
| 28 | Seedream 5.0 LiteBytedance | 1,125 | +/-4 | 66,899 | Bytedance | Proprietary |
| 30 | Seedream 4 (FAL)Bytedance | 1,117 | +/-7 | 11,861 | Bytedance | Proprietary |
| 34 | Seedream 4 High Res (FAL)Bytedance | 1,113 | +/-3 | 167,994 | Bytedance | Proprietary |
| 37 | Wan2.7 Image ProAlibaba | 1,104 | +/-5 | 28,420 | Alibaba | Proprietary |
| 38 | Wan2.7 ImageAlibaba | 1,099 | +/-5 | 28,699 | Alibaba | Proprietary |
| 41 | Seedream 3Bytedance | 1,082 | +/-5 | 36,922 | Bytedance | Proprietary |
| 63 | BAGELBytedance | 898 | +/-6 | 12,443 | Bytedance | Apache 2.0 |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
2026-05 Market Signals
Current Best (SOTA)
GPT-Image-1.5 High-Fidelity (OpenAI)
Gemini 3 Pro Image Preview 2K (Google)
Gemini 3 Pro Image Preview (Google)
Best China Model
HunyuanImage-3.0 (腾讯)
Seedream-4.5 (字节跳动)
Qwen-Image-2512 (阿里)
Best Open Model
- •Qwen-Image-2512 (阿里)
- •Z-Image-Turbo (阿里)
- •GLM-Image (智谱)
FAQ
What is the difference between text-to-image and image editing?
Text-to-image creates a new image from a prompt. Image editing modifies an existing image, which is better for local changes, style transfer, and production refinements.
Which models are suitable for commercial poster design?
For commercial posters, prioritize models with strong text rendering, controllable composition, high-resolution output, and license terms that fit your use case. The top-ranked model may not be the best option if typography or brand control matters most.
What is prompt engineering?
Prompt engineering means structuring the text input to guide the generated image. Clear descriptions of subject, style, lighting, composition, and constraints usually improve quality and alignment.
How large is the gap between open and closed image models?
The gap has narrowed, especially for customizable open models. Closed models can still lead on instruction following, typography, and detail consistency, while open models are attractive for local deployment, tuning, and cost control.

