Image-to-Video Arena Leaderboard
The latest AI image-to-video leaderboard based on anonymous Arena voting. Covers Elo scores, confidence intervals, and vote counts for leading video animation models.
Top Model
Seedance 2.0
Top Score
1,462
Model Count
39
Data version
2026年05月12日
Data source: LM Arena
About This Leaderboard
This leaderboard ranks AI image-to-video models by animation quality. Data comes from LMArena's Image-to-Video Arena track, evaluated through anonymous blind testing by real users.
Methodology Overview
Blind testing: Users upload an image, two anonymous models generate animated videos, and users vote for the more natural result.
Elo scoring: Based on the Bradley-Terry model, scientifically measuring each model's relative strength in image-to-video tasks.
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
| Seedance 2.0字节跳动Seed团队 | 1,462 |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
2026-05 Market Signals
Current Best (SOTA)
Grok Imagine Video 720p
Veo 3.1 Audio 1080p
Veo 3.1 Audio
Best China Model
Vidu-Q3-Pro
Wan2.5-I2V-Preview
Kling-2.6-Pro
Best Open Model
- •Wan-V2.2-A14B
- •LTX-2-19B
- •Pika-V2.2
FAQ
What is the difference between image-to-video and text-to-video?
Text-to-video generates a clip from a prompt alone. Image-to-video starts from a reference image, which gives stronger control over subject identity, composition, and visual style.
Which model should I use to animate old photos?
For portrait animation, compare models on facial expression stability, motion naturalness, and identity preservation. Specialized lip-sync tools may be better when speech alignment is the main requirement.
How can I keep characters consistent?
Use a strong reference image as the first frame, keep the prompt specific, and avoid large changes in clothing, camera angle, or style unless the model supports identity conditioning.

