Experimental fork of the Deforum extension for Stable Diffusion WebUI Forge, fix'd up to work with Flux.1, integrate Parseq keyframe redistribution logic, and support Wan 2.1 AI Video Generation. Integrates dynamic camera shake effects with data sourced from EatTheFutures 'Camera Shakify' Blender plugin.
This fork of the extension is basically working.
Install, update and run the 'one-click installation package' of Stable Diffusion WebUI Forge as described. Includes:
- Python 3.10.6
- CUDA 12.1
- Pytorch 2.3.1
Other versions may work with this extension, but have not been properly tested.
Get flux1-dev-bnb-nf4-v2.safetensors
from huggingface and put it into your <forge_install_dir>/models/Stable-diffusion/Flux
:
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/blob/main/flux1-dev-bnb-nf4-v2.safetensors
Get the following 3 files from huggingface and put them into <forge_install_dir>/models/VAE
ae.safetensors
https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/mainclip_l.safetensors
https://huggingface.co/comfyanonymous/flux_text_encoders/tree/maint5xxl_fp16.safetensors
https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main
Restart Forge, set mode to "flux", select the flux checkpoint and all the 3 VAEs in "VAE / Text Encoder" and test with Txt2Img.
Go to tab "Extensions" - "Install from URL" and use this: https://github.com/Tok/sd-forge-deforum.git
Open commandline and run <forge_install_dir>/venv/Scripts/activate.bat
to activate the virtual environment (venv) for Python used by Forge.
With the venv from Forge activated, do:
cd <forge_install_dir>/extensions
git clone https://github.com/Tok/sd-forge-deforum
cd sd-forge-deforum
pip install -r requirements.txt
Get the latest default-settings.txt and place it directly into your 'webui' directory, then click "Load All Settings":
https://raw.githubusercontent.com/Tok/sd-forge-deforum/main/scripts/default_settings.txt
Rename it to deforum_settings.txt
(or whatever matches the name of your settings file in the UI) and put it directly into your 'webui' directory.
default_settings.txt
and may need to be set manually the first time:
- Tab "Prompts" - "Prompts negative" not resetting
- Consider removing the defaults because they're not used with Flux.
Recommendation: Use ForgeUIs "Settings" - "Defaults" to save your settings.
The extension includes Wan 2.1 (Alibaba's state-of-the-art video generation model) fully integrated with Deforum's scheduling system for frame-perfect video creation.
- Prompt Scheduling: Uses Deforum's prompt system for precise clip timing
- FPS Integration: Single FPS setting controls both Deforum and Wan
- Seed Scheduling: Optional seed control from Keyframes β Seed & SubSeed tab
- Strength Scheduling: I2V chaining with continuity control from Keyframes β Strength tab
- Auto-Discovery: Automatically finds Wan models without manual configuration
- π¨ QwenPromptExpander: Automatically enhance and expand prompts for better video quality
- πΉ Movement Analysis: Translate Deforum movement schedules to English descriptions
- π§ Auto-Model Selection: Intelligent model choice based on available VRAM
- πΎ Smart Memory Management: Lazy loading and automatic cleanup for optimal VRAM usage
- βοΈ Manual Override: All AI enhancements are fully editable before generation
-
Download Wan Models (choose one):
# Recommended: VACE 1.3B model (8GB+ VRAM) - All-in-one T2V+I2V huggingface-cli download Wan-AI/Wan2.1-VACE-1.3B --local-dir models/wan # High Quality: VACE 14B model (16GB+ VRAM) - All-in-one T2V+I2V huggingface-cli download Wan-AI/Wan2.1-VACE-14B --local-dir models/wan # Alternative: Separate T2V models (no I2V chaining) huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir models/wan # Legacy: Separate I2V models (for compatibility with older setups) huggingface-cli download Wan-AI/Wan2.1-I2V-1.3B --local-dir models/wan huggingface-cli download Wan-AI/Wan2.1-I2V-14B --local-dir models/wan
-
Optional: Download Qwen Models for AI Enhancement: Models are auto-downloaded to
models/qwen/
when first used:# Models are automatically downloaded when "Enhance Prompts" is clicked # Storage location: webui-forge/webui/models/qwen/ # Auto-selected based on your VRAM: 3B (4GB), 7B (8GB), 14B (16GB+)
-
Configure in Deforum:
- Set prompts in Prompts tab with frame numbers
- Set FPS in Output tab
- Go to Wan Video tab for AI enhancement and generation options
-
Configure Base Prompts:
{ "0": "mountain landscape", "30": "misty valley", "60": "golden sunlight", "90": "illuminated peaks" }
-
Enable AI Enhancement in Wan Video tab:
- β Enable Prompt Enhancement
- π€ Select Qwen Model (Auto-Select recommended)
- πΉ Enable Movement Analysis
- π― Click "Enhance Prompts"
-
AI Enhanced Result:
{ "0": "A breathtaking mountain landscape at dawn, with towering snow-capped peaks rising majestically against a pristine azure sky, with camera movement with slow right pan, forward dolly", "30": "Morning mist gracefully rising from the valleys below, creating ethereal wisps that dance between ancient pine trees, with camera movement with medium left pan, upward tilt", "60": "Golden sunlight breaking through dramatic cloud formations, casting warm amber rays across the rugged terrain and illuminating every crevice, with camera movement with fast zoom in, clockwise roll", "90": "Full daylight illuminating the magnificent peaks in all their glory, revealing intricate details of rock formations and alpine meadows, with camera movement with slow backward dolly, downward pitch" }
-
Edit and Generate: Enhanced prompts are fully editable before clicking "Generate Wan Video"
Model | VRAM | Type | Description | Best For |
---|---|---|---|---|
QwenVL2.5_3B | 8GB | Vision+Text | Fast, supports images | Quick enhancement |
QwenVL2.5_7B | 16GB | Vision+Text | Balanced quality | Most users β |
Qwen2.5_3B | 6GB | Text-only | Memory efficient | Low-VRAM systems |
Qwen2.5_7B | 14GB | Text-only | High quality | Text enhancement |
Qwen2.5_14B | 28GB | Text-only | Maximum quality | High-end systems |
Auto-Selection Logic: The system automatically chooses the best model for your VRAM:
- 4-6GB β Qwen2.5_3B
- 8-12GB β Qwen2.5_7B
- 16GB+ β QwenVL2.5_7B or Qwen2.5_14B
The system translates complex Deforum schedules into human-readable descriptions with frame-specific analysis:
Deforum Schedule | AI Translation |
---|---|
translation_x: "0:(0), 30:(100)" |
"camera movement with moderate panning right (extended)" |
translation_z: "0:(0), 60:(-50)" |
"camera movement with gentle dolly backward (sustained)" |
rotation_3d_y: "0:(0), 45:(20)" |
"camera movement with subtle rotating right (extended)" |
zoom: "0:(1.0), 30:(1.5)" |
"camera movement with moderate zooming in (brief)" |
Frame-Specific Analysis: Each prompt gets unique movement descriptions based on its position in the video timeline:
- Frame 0: "camera movement with subtle panning left (sustained) and gentle tilting down (extended)"
- Frame 43: "camera movement with moderate panning right (brief) and subtle rotating left (sustained)"
- Frame 106: "camera movement with gentle dolly forward (extended) and subtle rolling clockwise (brief)"
Camera Shakify Integration: When enabled, the system analyzes the actual Camera Shakify pattern at each frame position to provide varied, specific directional descriptions instead of generic "investigative handheld camera movement" text.
Combined Example:
Input: translation_x: "0:(0), 30:(100)", rotation_3d_x: "0:(0), 60:(15)", zoom: "0:(1.0), 40:(0.7)"
Camera Shakify: INVESTIGATION pattern enabled
Output: "camera movement with moderate panning right (extended), subtle tilting up (sustained), and gentle zooming out (brief)"
- Lazy Loading: Qwen models are only loaded when "Enhance Prompts" is clicked
- Auto-Cleanup: Models are automatically unloaded before video generation to free VRAM
- Manual Control: "Cleanup Qwen Cache" button for immediate VRAM release
- Status Monitoring: Real-time display of loaded models and VRAM usage
VACE (Video Adaptive Conditional Enhancement) models are Wan's latest all-in-one architecture that handles both Text-to-Video and Image-to-Video generation with a single model, providing superior consistency for I2V chaining:
- π Unified Architecture: Single model handles both T2V and I2V generation
- π― Perfect Consistency: Same model ensures visual continuity between clips
- β‘ Efficient Memory: No need to load separate T2V and I2V models
- π¨ Enhanced Quality: Latest architecture with improved video generation
{
"0": "A serene mountain landscape at dawn",
"30": "Morning mist rising from the valleys",
"60": "Golden sunlight breaking through clouds",
"90": "Full daylight illuminating the peaks"
}
At 30 FPS, this creates exactly 1-second clips with seamless I2V transitions using VACE's unified architecture.
Model | Type | Size | VRAM | Speed | Quality | I2V Chaining | Best For |
---|---|---|---|---|---|---|---|
VACE-1.3B | All-in-one | ~17GB | 8GB+ | Fast | Good | β Perfect | Most Users β |
VACE-14B | All-in-one | ~75GB | 16GB+ | Slow | Excellent | β Perfect | High-end Systems |
T2V-1.3B | T2V Only | ~17GB | 8GB+ | Fast | Good | β None | Independent Clips |
T2V-14B | T2V Only | ~75GB | 16GB+ | Slow | Excellent | β None | Independent Clips |
I2V-1.3B | I2V Only | ~17GB | 8GB+ | Fast | Good | β Good | Legacy I2V Chaining |
I2V-14B | I2V Only | ~75GB | 16GB+ | Slow | Excellent | β Good | Legacy I2V Chaining |
π‘ Recommendation: Use VACE models for I2V chaining workflows, T2V models only for independent clip generation.
For comprehensive documentation, see:
- Wan User Guide - Complete setup and usage guide
- Technical Reference - Developer documentation
- I2V Chaining: Seamless transitions between clips using last frame as starting image
- Continuity Control: Strength override for maximum clip-to-clip continuity
- 4n+1 Frame Calculation: Automatic handling of Wan's frame requirements
- Flash Attention Fallback: Works with or without flash-attn
- Memory Optimization: Efficient VRAM usage for large generations
- VACE T2V Mode: Uses blank frame transformation for pure text-to-video generation
After installation, you can test the setup by generating the default bunny with "Distribution" set to "Keyframes Only" and "Animation Mode" set to "3D". This also downloads Depth-Anything V2 or the MiDaS model for depth warping when ran for the first time and demonstrates prompt synchronization in a no-cadence setup.
The default bunnies contain 333 frames at 720p, but only 19 of them are actually diffused. The diffused frames are placed in the clip according to the keyframes defined in the prompts. The prompts themselves are aligned to be synchronized at 60 FPS with the beat of an amen break you can find linked in the settings (enable sound):
default_bunny_2.mp4
If you used other versions of the Deforum plugin before, it may also be necessary to update or adjust your Deforum settings. The latest example settings with for the default bunny can also be downloaded here:
https://github.com/Tok/sd-forge-deforum/blob/main/scripts/default_settings.txt
- Text-to-Video: High-quality AI video generation with precise frame timing
- Auto-Discovery: Automatic model detection and validation
- Flash Attention Fallback: Compatible with systems without flash-attn
- Audio Synchronization: Frame-perfect timing for music videos
- Multiple Resolutions: Support for various output sizes
- QwenPromptExpander: Automatic prompt enhancement with 5 model options (3B-14B)
- Movement Analysis: Translation of Deforum schedules to English descriptions
- Auto-Model Selection: Intelligent choice based on available VRAM (4GB-28GB)
- Lazy Loading: Models only load when needed, auto-unload before generation
- Manual Editing: All AI enhancements are fully editable before generation
- Multi-Language: English and Chinese prompt enhancement support
- Dynamic Motion Strength: Automatic calculation from movement patterns
Causes the rendering to run on an experimental core that can rearrange keyframes, which makes it possible to set up fast generations with less jitter at high or no cadence.
- Can now be used with- or without- Parseq.
- Allows for precise sync at high cadence.
- Detailed info and recommendations on new tab.
All subtitles are now generated and written to an .srt file in advance. Complex subtitle generations should work fine with Parseq but are currently limited with Deforum-only setups.
- New Deforum setting for skipping the prompt-part to a new line in .srt files.
- New Deforum setting for choosing simple (non-technical) subtitles that contain only the text from the prompt.
- Complex subtitles should work fine when Parseq is used, but are otherwise limited to essential information only.
- Recommendation: turn on for now if not using Parseq
- Removed emtpy "--neg" param from being written into the subtitles because negative prompts are ignored in Flux workflows.
- Improved padding of technical information so subtitles jitter less.
Add camera shake effects to your renders on top of your other movement.
This feature enhances the realism of your animations by simulating natural camera movements, adding a layer of depth and engagement to your visuals. Perfect for creating action sequences or adding a sense of spontaneity, it allows for customizable shake parameters to fit your specific needs.
The shake data is available under Creative Commons CC0 1.0 Universal license and was sourced from the 'Camera Shakify' Blender plugin by EatTheFuture.
- Flux schnell
- There's not a lot of precision for fine-tuning strength values when only 4 steps are required.
- Control Net
- Hybrid Video
- Non-Flux workflows
- Kohya HR Fix
- may need to be left disabled
- FreeU
- may need to be left disabled
- Control Net
- Includes a new default setup to generate default bunny at 60 FPS in 720p with keyframes only.
- Non-essential emojis can be turned off with a checkbox under "Settings" - "Deforum".
- Seed and Subseed tabs unified.
- No models found: Download Wan models using the commands above
- Generation fails: Try the 1.3B model if using 14B, check VRAM usage
- Flash attention errors: Compatibility layer should handle this automatically
- Audio sync problems: Verify frame numbers in prompt schedule match your timing needs
- Model download fails: Check internet connection, models auto-download to
webui/models/qwen/
- Out of VRAM: Use "Cleanup Qwen Cache" button or select smaller model (3B instead of 7B/14B)
- Enhancement fails: Try "Auto-Select" model option, ensure prompts are properly formatted
- Slow enhancement: Larger models (14B) take more time, consider using 7B or 3B for speed
- Enhancement button not working: Check console for errors, restart WebUI if needed
- No movement detected: Increase movement sensitivity or check schedule format ("frame:(value)")
- Incorrect analysis: Verify Deforum schedules use proper syntax, try different sensitivity settings
- Motion strength wrong: Enable manual override in Overrides section for custom values
During active development, content and structure of the deforum_settings.txt
file
can change quickly been updated. Settings from older versions may not behave as expected.
If necessary, the latest deforum-settings.txt are available for download here:
https://github.com/Tok/sd-forge-deforum/blob/main/scripts/default_settings.txt
- Import errors: Restart WebUI after installation
- Missing dependencies: Run
pip install -r requirements.txt
- Performance issues: Check VRAM usage and reduce settings