Mastering Prompt Writing for Wan 2.1 in ComfyUI: Camera Movements
This guide will teach you how to write effective prompts by incorporating cinematographic elements such as camera movements, lighting, atmosphere, and visual style—concepts used by professional videographers and filmmakers. It will also cover prompt structure, tips for refining prompts, and community insights to help you create visually compelling AI-generated videos.
Creating high-quality AI-generated videos requires more than just a basic prompt. The level of detail in your descriptions significantly impacts the final output. Whether you're crafting cinematic sequences, stylized animations, or realistic video clips, understanding how to structure your prompts is key to getting the most out of the model.
Typically, video generation prompts can be made most effective by incorporating cinematographic elements such as camera movements, lighting, atmosphere, composition, and visual style—concepts used by professional videographers and filmmakers. However, every model has its own quirks in how it reacts to prompt structures and elements, so working with Wan 2.1 requires trial and error.
This guide focuses on camera movements in Wan 2.1—how well they work, what needs tweaking, and where the model struggles. We will update this post over time as we continue testing and refining prompts. Be sure to check back often!
General Guidelines for Effective Video Prompts
A well-structured prompt should be:
Detailed: The best prompts are around 80-100 words long, providing a vivid description of the desired video.
Specific: Mention what the camera sees, how it moves, the lighting conditions, the mood, and any relevant cinematic techniques.
Contextualized: If applicable, add details such as the time of day, weather conditions, and setting.
Iterative: Experiment with different versions of a prompt to refine the output.
Key elements that typically help in creating effective video prompts include:
1. Camera Movements
Describing how the camera moves is crucial in creating dynamic, engaging videos. Examples include:
Pan Left/Right: Horizontal movement of the camera.
Tilt Up/Down: Vertical movement of the camera.
Dolly In/Out: A smooth zooming effect, often used for dramatic emphasis.
Tracking Shot: The camera follows a subject.
Crash Zoom: A rapid zoom into or out of a subject. (See examples here)
Camera Roll: The camera rotates along its own axis.
2. Lighting
Lighting can drastically change the feel of a scene:
Soft Light: Gentle and diffused, creates a warm atmosphere.
Hard Light: Harsh and direct, adds intensity.
Backlight: Creates silhouettes and dramatic contrast.
Volumetric Lighting: Adds visible beams of light through fog or dust.
3. Atmosphere and Mood
Setting the right atmosphere ensures that the generated video conveys the desired tone:
Somber: Gloomy, reflective, often with overcast lighting.
Describing how the subject is framed in the shot helps create engaging visuals:
Close-Up: Focusing on a subject’s facial expressions or details.
Wide Shot: Establishing a setting with a broad view.
Low Angle: Making the subject appear imposing or powerful.
High Angle: Looking down to make the subject appear small or vulnerable.
5. Visual Style and Effects
Defining the look and feel of the video is essential for guiding Wan 2.1:
Cinematic: Rich, high-contrast visuals with a filmic quality.
Vintage Film Look: Grainy textures and muted colors.
Shallow Depth of Field: Blurred backgrounds with sharp focus on the subject.
Motion Blur: Simulating real-world camera motion for natural movement.
Camera Movements in Wan 2.1: What Works and What Doesn’t
After extensive experimentation, we found that Wan 2.1 does not always respect every camera movement. Some work well, while others are ignored or cause the scene to become static. Below, we detail our findings. For this experiment, we used the 14B text-to-video model and everything was produced in 480p to allow faster iterations.
Pan Left/Right and Whip Pan
Wan 2.1 successfully generates panning motions but does not always respect the direction.
Achieving a left or right pan requires multiple attempts and prompt refinement.
Whip pans (fast panning transitions) do not work—Wan 2.1 refuses rapid motion.
Example prompt:
"A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet."
A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet.
Pull Back
This works well when structured correctly.
The most reliable formula is: [Opening shot details] + [Camera movement] + [Details revealed after camera movement]
Example prompt:
"Close up shot of the determined face of a battle-worn samurai. Camera pulls back to reveal him standing alone on a foggy battlefield, gripping his katana. Camera pulls back to reveal fallen warriors behind him. Wind whips through the trees, sending red autumn leaves swirling."
Close up shot of the determined face of a battle-worn samurai. Camera pulls back to reveal him standing alone on a foggy battlefield, gripping his katana. Camera pulls back to reveal fallen warriors behind him. Wind whips through the trees, sending red autumn leaves swirling.
Dolly In/Out (Hitchcock Zoom/Vertigo Effect)
Dolly in works well, but dolly out fails consistently.
Prompt structure matters—describing background elements before the movement can prevent the effect from working.
Here, we used a prompt used by the Wan 2.1 team and then attempted to add camera movement.
Starting prompt:
"In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. The background shows an abandoned, dim factory with light filtering through the windows. There’s a noticeable grainy texture. A medium shot with a straight-on close-up of the character."
Good prompt with dolly in:
"In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera hitchcock zooms in. The background shows an abandoned, dim factory with light filtering through the windows. There's a noticeable grainy texture. A medium shot with a straight-on close-up of the character."
In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera hitchcock zooms in. The background shows an abandoned, dim factory with light filtering through the windows. There's a noticeable grainy texture. A medium shot with a straight-on close-up of the character.
Attempting to create a dolly out effect produced really good dolly ins instead:
In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera dollies out. The background shows an abandoned, dim factory with light filtering through the windows. There's a noticeable grainy texture. A medium shot with a straight-on close-up of the character.
Tilt
Getting a tilt was similarly difficult. We envisioned a scene with a mountaineer who's looking at a formidable mountain ahead of him. The scene was to start off focused on the man and have the camera tilt up to reveal the mountain in front. This may work with more experimentation, but we found that we had to focus the start shot on the feet and then move up. The resulting effect is closer to a boom shot rather than tilt up, but it does achieve a similar look-and-feel.
Example prompt:
"A close up shot of the feet of a man wearing mountaineering gear, standing in a grassy field. He is facing away from the camera. Camera slowly tilts up, revealing the full body of a mountaineer wearing gear. He faces away from the camera. In the distance in front of him majestic rocky mountains tower above."
A close-up shot of the feet of a man wearing mountaineering gear, standing in a grassy field. Camera slowly tilts up, revealing the full body of a mountaineer wearing gear. In the distance, majestic rocky mountains tower above.
Tracking Shot
Wan 2.1 handles tracking shots well when explicitly described.
Example prompt:
"A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. Pedestrians in futuristic outfits rush by as holographic advertisements flicker in the air. The camera follows a hooded figure in a long tracking shot, weaving through the crowded market. Overhead lights cast a moody glow, while fog drifts through the alleyways. The scene is dark and mysterious, with blue and purple lighting creating a high-tech, dystopian feel."
A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. Pedestrians in futuristic outfits rush by as holographic advertisements flicker in the air. The camera follows a hooded figure in a long tracking shot, weaving through the crowded market. Overhead lights cast a moody glow, while fog drifts through the alleyways. The scene is dark and mysterious, with blue and purple lighting creating a high-tech, dystopian feel.
Crash Zoom
Similar to the whip pan, this camera technique has fast motion, and we could not generate good results with Wan 2.1.
Attempting crash zooms leads to static or poorly transitioned results.
Example prompt:
"In a large dimly lit midcentury modern room, a man sits with an authoritative and pensive pose on a leather chair. He is wearing a dark suit jacket and grey trousers. He has silver hair. The chair is in the center of the screen. Behind the chair, there is an oak console with a lamp. The wall is made of oak panels. The man looks directly at the camera. Camera rapidly zooms in on the man's face. Then he lets out a slight smirk."
In a large dimly lit midcentury modern room, a man sits with an authoritative and pensive pose on a leather chair. He is wearing a dark suit jacket and grey trousers. He has silver hair. The chair is in the center of the screen. Behind the chair, there is an oak console with a lamp. The wall is made of oak panels. The man looks directly at the camera. Camera rapidly zooms in on the man's face. Then he lets out a slight smirk.
Camera Roll
Somewhat achievable after multiple prompt refinements, although the resulting rotation is never exactly on the camera axis.
Example prompt:
"Overhead shot of a man asleep on his desk in front of his computer. The room is dark except for the monitor’s glow. The camera rolls in a full 360 motion."
Overhead shot of a man fallen asleep on his desk in front of his computer. The room is dark except for the light from the monitor. The man's head is on his arms by the keyboard. Around the desk, there is a mess of papers and floppy disks. The camera rolls in full 360 motion.
Conclusion
While Wan 2.1 can generate some camera movements effectively, others are inconsistent or non-functional. We recommend structuring prompts carefully, avoiding overly fast camera actions, and iterating based on results. We will continue updating this guide as we refine our findings. Stay tuned!
One more thing...
Generating videos using Wan 2.1 is fun and highly productive but also very resource expensive. If you don't have the right hardware, you can be waiting a long time to generate a clip that is a few seconds long. InstaSD let's you run ComfyUI workflows on powerful GPUs at a very affordable price. All of the following Wan 2.1 workflows are ready to launch on GPUs as powerful as H100s with just one click: