DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
HomeOverall LeaderboardImage-to-Video Arena Leaderboard

LMArena Tracks

Text GenerationCodingMathImage EditText-to-VideoImage-to-Video

Image-to-Video Arena Leaderboard

The latest AI image-to-video leaderboard based on anonymous Arena voting. Covers Elo scores, confidence intervals, and vote counts for leading video animation models.

Top Model

happyhorse-1.0

Top Score

1,445

Model Count

39

Data version

2026年05月12日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks AI image-to-video models by animation quality. Data comes from LMArena's Image-to-Video Arena track, evaluated through anonymous blind testing by real users.

Methodology Overview

Blind testing: Users upload an image, two anonymous models generate animated videos, and users vote for the more natural result.

Elo scoring: Based on the Bradley-Terry model, scientifically measuring each model's relative strength in image-to-video tasks.

Origin:AllChina
Leaderboard snapshot month:

Ranking Table

RankModelScore95% CIVotesOrganizationLicense
Alibaba-ATHhappyhorse-1.0Alibaba-ATH1,445+/-1514,889Alibaba-ATHProprietary

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

2026-05 Market Signals

Current Best (SOTA)

01

Grok Imagine Video 720p

02

Veo 3.1 Audio 1080p

03

Veo 3.1 Audio

Best China Model

Vidu-Q3-Pro

Wan2.5-I2V-Preview

Kling-2.6-Pro

Best Open Model

  • •Wan-V2.2-A14B
  • •LTX-2-19B
  • •Pika-V2.2

FAQ

01

What is the difference between image-to-video and text-to-video?

Text-to-video generates a clip from a prompt alone. Image-to-video starts from a reference image, which gives stronger control over subject identity, composition, and visual style.

02

Which model should I use to animate old photos?

For portrait animation, compare models on facial expression stability, motion naturalness, and identity preservation. Specialized lip-sync tools may be better when speech alignment is the main requirement.

03

How can I keep characters consistent?

Use a strong reference image as the first frame, keep the prompt specific, and avoid large changes in clothing, camera angle, or style unless the model supports identity conditioning.

04
Text-to-Image

Diverse animation scenarios: Covers portrait animation, landscape motion, object transformation, artistic creation, and more.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

10
KLKling v3 ProKlingAI
1,360
+/-12
86,305
KlingAI
Proprietary
14AlibabaWan2.6 I2VAlibaba1,316+/-1249,232AlibabaProprietary
20MiniMaxAIHailuo 2.3MiniMaxAI1,257+/-7184,943MiniMaxAIProprietary
26MiniMaxAIHailuo 2.3MiniMaxAI1,227+/-1021,751MiniMaxAIProprietary
29MiniMaxAIHailuo 2.3MiniMaxAI1,222+/-921,782MiniMaxAIProprietary
32MiniMaxAIMiniMax Hailuo 2.3 FastMiniMaxAI1,192+/-1022,549MiniMaxAIProprietary

What is first-frame fidelity?

First-frame fidelity measures how closely the generated video preserves the uploaded reference image at the beginning of the clip. Higher fidelity means the video feels like motion extending from the source image rather than a loose reinterpretation.