One GPU to Rule Them All

Wan2.1 Performance Testing Across GPUs

Following our previous deep dive into HunyuanVideo, we’re continuing our "One GPU to Rule Them All" series with performance testing on Wan2.1, the latest text-to-video and image-to-video model from Alibaba Cloud. This open-source video diffusion model pushes the boundaries of video generation, balancing high performance and consumer GPU accessibility.

Highlights of the Wan2.1 Model Family

  • Supports Consumer-Grade GPUs – The T2V-1.3B model requires only 8.19GB VRAM, making it accessible on most consumer GPUs. On an RTX 4090, it generates a 5-second 480P video in about 4 minutes (without optimizations).
  • Multiple Tasks – Wan2.1 is designed for Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and even Video-to-Audio tasks.
  • Visual Text Generation – It’s the first open-source video model capable of generating Chinese and English text within video frames.
  • Powerful Video VAE – Wan-VAE efficiently encodes and decodes 1080P videos while preserving temporal information, making it a strong foundation for AI video generation.

This model outperforms existing open-source alternatives and competes with commercial solutions while remaining fully open-source under the Apache 2.0 license.

Test Setup

To evaluate performance across different GPUs, we ran Wan2.1’s Text-to-Video 14B model under consistent conditions:

  • Frames: 33
  • Frame Rate: 16 FPS
  • Total Duration: 2 seconds
  • Steps: 30
  • Resolutions Tested: 480P and 720P

Important Notes:

  • 24GB GPUs (A5000, RTX 4090) could not generate 720P videos. For these, we recommend using the Text-to-Video 1.3B model, but at a lower quality.
  • Higher VRAM GPUs performed significantly better at 720P generation.

Performance Results

Below is a bar chart comparing the time taken to generate 480P and 720P videos across different GPUs:

Wan2.1 Performance across GPUs

We tested six different GPUs to measure the time taken to generate 480P and 720P videos using Wan2.1:

GPU480P Runtime (Seconds)720P Runtime (Seconds)A5000462Not SupportedA403501083A100170523RTX 4090281Not SupportedL40290859H10085284

Analysis & Key Findings

1️⃣ H100 dominates performance. At 85 seconds for 480P and 284 seconds for 720P, it’s the fastest GPU tested, outperforming all others by a wide margin.

2️⃣ A100 is the best balance of speed and accessibility. It handled 720P in 523s, making it a great mid-tier option for video generation.

3️⃣ L40 and A40 handle 720P, but are significantly slower. If generating at higher resolutions, A100 or H100 is the better choice.

4️⃣ RTX 4090 and A5000 cannot generate 720P. These GPUs lack the necessary VRAM for higher-resolution video generation, though they perform reasonably well at 480P.

Choosing the Best GPU for Wan2.1 Video Generation

Best Overall Performance 🏆

H100 – Fastest video generation by far, handling both 480P and 720P with ease.

Best Price-to-Performance Ratio 💰

A100 – Delivers good speed for both resolutions, making it an excellent mid-range choice.

Best for Consumer GPUs 🖥️

RTX 4090 – Performs well at 480P, but lacks the VRAM for 720P.

Avoid for High-Res Video Generation

A5000 & RTX 4090 – These GPUs cannot generate 720P with Wan2.1’s 14B model.

What’s Next?

This test focused on Wan2.1’s Text-to-Video capabilities, but upcoming articles will explore:

  • Optimizing Video Generation Speed with FP8 and Quantization
  • Comparing Text-to-Video with Image-to-Video Performance
  • Fine-tuning Wan2.1 for Custom Use Cases

For a deeper dive into GPU performance comparisons, check out our previous article:

👉 One GPU to Rule Them All: Comparing GPU Performance for ComfyUI Workflows

Try Wan2.1 Yourself

If you’d like to test Wan2.1 on different GPUs and compare speeds, you can run ComfyUI online with this fully optimized workflow on InstaSD. Choose from RTX 4090, L40, A100, or H100 and generate high-quality videos with the model.

Other Posts