Text-to-Image Arena Leaderboard
The latest AI text-to-image model leaderboard based on Text-to-Image Arena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for GPT-Image, FLUX, Midjourney, DALL-E, and more.
Top Model
GPT-image-2 (medium)
Top Score
1,389
Model Count
63
Data version
2026年05月22日
Data source: LM Arena
About This Leaderboard
This leaderboard ranks AI text-to-image models by generation quality. Data comes from LMArena's Text-to-Image Arena track, evaluated through anonymous blind testing by real users.
Methodology Overview
Blind testing: Users submit text prompts, two anonymous models generate images, and users vote for the better result.
Elo scoring: Based on the Bradley-Terry model, scientifically measuring each model's relative strength in text-to-image generation.
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
GPT-image-2 (medium)OpenAI |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
2026-05 Market Signals
Current Best (SOTA)
GPT-Image-1.5 High-Fidelity (OpenAI)
Gemini 3 Pro Image Preview 2K (Google)
Gemini 3 Pro Image Preview (Google)
Best China Model
HunyuanImage-3.0 (腾讯)
Seedream-4.5 (字节跳动)
Qwen-Image-2512 (阿里)
Best Open Model
- •Qwen-Image-2512 (阿里)
- •Z-Image-Turbo (阿里)
- •GLM-Image (智谱)
FAQ
What is the difference between text-to-image and image editing?
Text-to-image creates a new image from a prompt. Image editing modifies an existing image, which is better for local changes, style transfer, and production refinements.
Which models are suitable for commercial poster design?
For commercial posters, prioritize models with strong text rendering, controllable composition, high-resolution output, and license terms that fit your use case. The top-ranked model may not be the best option if typography or brand control matters most.
What is prompt engineering?
Prompt engineering means structuring the text input to guide the generated image. Clear descriptions of subject, style, lighting, composition, and constraints usually improve quality and alignment.




