What is first-frame fidelity?

First-frame fidelity measures how closely the generated video preserves the uploaded reference image at the beginning of the clip.

LMArena Tracks

Text Generation Coding Math Image Edit Text-to-Video Image-to-Video

Image-to-Video Arena Leaderboard

Name: Image-to-Video Arena Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

The latest AI image-to-video leaderboard based on anonymous Arena voting. Covers Elo scores, confidence intervals, and vote counts for leading video animation models.

Top Model

Seedance 2.0

Top Score

1,462

Model Count

Data version

2026年05月12日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks AI image-to-video models by animation quality. Data comes from LMArena's Image-to-Video Arena track, evaluated through anonymous blind testing by real users.

Methodology Overview

Blind testing: Users upload an image, two anonymous models generate animated videos, and users vote for the more natural result.

Elo scoring: Based on the Bradley-Terry model, scientifically measuring each model's relative strength in image-to-video tasks.

Origin:All China

Leaderboard snapshot month:

Ranking Table

Rank	Model	Score	95% CI	Votes	Organization	License
	Seedance 2.0字节跳动Seed团队	1,462

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

2026-05 Market Signals

Current Best (SOTA)

Grok Imagine Video 720p

Veo 3.1 Audio 1080p

Veo 3.1 Audio

Best China Model

Vidu-Q3-Pro

Wan2.5-I2V-Preview

Kling-2.6-Pro

Best Open Model

•Wan-V2.2-A14B
•LTX-2-19B
•Pika-V2.2

FAQ

What is the difference between image-to-video and text-to-video?

Text-to-video generates a clip from a prompt alone. Image-to-video starts from a reference image, which gives stronger control over subject identity, composition, and visual style.

Which model should I use to animate old photos?

For portrait animation, compare models on facial expression stability, motion naturalness, and identity preservation. Specialized lip-sync tools may be better when speech alignment is the main requirement.

How can I keep characters consistent?

Use a strong reference image as the first frame, keep the prompt specific, and avoid large changes in clothing, camera angle, or style unless the model supports identity conditioning.