DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
HomeOverall LeaderboardImage-to-Video Arena Leaderboard

LMArena Tracks

Text GenerationCodingMathImage EditText-to-VideoImage-to-Video

Image-to-Video Arena Leaderboard

The latest AI image-to-video leaderboard based on anonymous Arena voting. Covers Elo scores, confidence intervals, and vote counts for leading video animation models.

Top Model

Seedance 2.0

Top Score

1,462

Model Count

39

Data version

2026年05月12日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks AI image-to-video models by animation quality. Data comes from LMArena's Image-to-Video Arena track, evaluated through anonymous blind testing by real users.

Methodology Overview

Blind testing: Users upload an image, two anonymous models generate animated videos, and users vote for the more natural result.

Elo scoring: Based on the Bradley-Terry model, scientifically measuring each model's relative strength in image-to-video tasks.

Origin:AllChina
Leaderboard snapshot month:

Ranking Table

RankModelScore95% CIVotesOrganizationLicense
字节Seedance 2.0字节跳动Seed团队1,462

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

2026-05 Market Signals

Current Best (SOTA)

01

Grok Imagine Video 720p

02

Veo 3.1 Audio 1080p

03

Veo 3.1 Audio

Best China Model

Vidu-Q3-Pro

Wan2.5-I2V-Preview

Kling-2.6-Pro

Best Open Model

  • •Wan-V2.2-A14B
  • •LTX-2-19B
  • •Pika-V2.2

FAQ

01

What is the difference between image-to-video and text-to-video?

Text-to-video generates a clip from a prompt alone. Image-to-video starts from a reference image, which gives stronger control over subject identity, composition, and visual style.

02

Which model should I use to animate old photos?

For portrait animation, compare models on facial expression stability, motion naturalness, and identity preservation. Specialized lip-sync tools may be better when speech alignment is the main requirement.

03

How can I keep characters consistent?

Use a strong reference image as the first frame, keep the prompt specific, and avoid large changes in clothing, camera angle, or style unless the model supports identity conditioning.

04
Text-to-Image

Diverse animation scenarios: Covers portrait animation, landscape motion, object transformation, artistic creation, and more.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

+/-13
42,574
字节跳动Seed团队
Proprietary
Alibaba-ATHhappyhorse-1.0Alibaba-ATH1,445+/-1514,889Alibaba-ATHProprietary
xAIGrok Imagine 0.9xAI1,423+/-6333,111xAIProprietary
4Google Deep MindVeo 3.1 Generate (Preview)Google Deep Mind1,397+/-1125,117Google Deep MindProprietary
5Google Deep MindVeo 3.1 Generate (Preview)Google Deep Mind1,394+/-1015,561Google Deep MindProprietary
6Google Deep MindVeo 3.1 Fast (Preview)Google Deep Mind1,384+/-999,882Google Deep MindProprietary
7xAIGrok Imagine 0.9xAI1,383+/-919,412xAIProprietary
8Google Deep MindVeo 3.1 Fast (Preview)Google Deep Mind1,376+/-1116,006Google Deep MindProprietary
9SHVidu Q3 ProShengshu1,361+/-836,677ShengshuProprietary
10KLKling v3 ProKlingAI1,360+/-1286,305KlingAIProprietary
11Google Deep MindVeo 3.1 Generate (Preview)Google Deep Mind1,330+/-1132,383Google Deep MindProprietary
12阿里Wan2.1-T2V-14B阿里巴巴1,325+/-1312,633阿里巴巴Proprietary
13Google Deep MindVeo 3.1 Fast (Preview)Google Deep Mind1,324+/-941,215Google Deep MindProprietary
14AlibabaWan2.6 I2VAlibaba1,316+/-1249,232AlibabaProprietary
15字节Seedance 2.0字节跳动Seed团队1,306+/-8184,337字节跳动Seed团队Proprietary
16PIPixverse v5.6Pixverse1,302+/-1281,425PixverseProprietary
17昆仑Kling 2.5 Turbo昆仑万维1,294+/-8152,288昆仑万维Proprietary
18昆仑Kling 2.5 Turbo昆仑万维1,274+/-123,791昆仑万维Proprietary
19字节Seedance 2.0字节跳动Seed团队1,272+/-834,028字节跳动Seed团队Proprietary
20MiniMaxAIHailuo 2.3MiniMaxAI1,257+/-7184,943MiniMaxAIProprietary
21Google Deep MindVeo 3.1 Fast (Preview)Google Deep Mind1,256+/-1026,297Google Deep MindProprietary
22Google Deep MindVeo 3.1 Generate (Preview)Google Deep Mind1,255+/-1026,105Google Deep MindProprietary
23PRp-videoPruna1,243+/-1723,382PrunaProprietary
24SHVidu Q2 TurboShengshu1,242+/-172,506ShengshuProprietary
25昆仑Kling 2.5 Turbo昆仑万维1,233+/-829,849昆仑万维Proprietary
26MiniMaxAIHailuo 2.3MiniMaxAI1,227+/-1021,751MiniMaxAIProprietary
27昆仑Kling 2.5 Turbo昆仑万维1,227+/-829,952昆仑万维Proprietary
28LURay 3Luma AI1,225+/-191,588Luma AIProprietary
29MiniMaxAIHailuo 2.3MiniMaxAI1,222+/-921,782MiniMaxAIProprietary
30SHVidu Q2 ProShengshu1,222+/-172,608ShengshuProprietary
31腾讯Hunyuan-A13B-Instruct腾讯AI实验室1,195+/-155,472腾讯AI实验室tencent-hunyuan-community
32MiniMaxAIMiniMax Hailuo 2.3 FastMiniMaxAI1,192+/-1022,549MiniMaxAIProprietary
33字节Seedance 2.0字节跳动Seed团队1,184+/-833,753字节跳动Seed团队Proprietary
34阿里Wan2.1-T2V-14B阿里巴巴1,169+/-1027,067阿里巴巴Apache 2.0
35Google Deep MindVeo 3.1 Generate (Preview)Google Deep Mind1,164+/-1610,319Google Deep MindProprietary
36LILTX 2 19Blightricks1,140+/-7135,497lightricksltx-2-community-license-agreement
37LURay 2Luma AI1,106+/-169,527Luma AIProprietary
38RURunway Gen-4 TurboRunway1,051+/-136,811RunwayProprietary
39PIPika v2.2Pika995+/-138,655PikaProprietary

What is first-frame fidelity?

First-frame fidelity measures how closely the generated video preserves the uploaded reference image at the beginning of the clip. Higher fidelity means the video feels like motion extending from the source image rather than a loose reinterpretation.