We are thrilled to announce the release of MindONE 0.3.0, featuring more state-of-the-art multi-modal understanding and generative models and better compatibility with transformers
and diffusers
. MindONE now supports the latest features in diffuers
v0.32.2, including over 160 pipelines, 50 models, and 35 schedulers. It allows users to easily develop new image/video/audio generation models or transfer existing models from torch to mindspore. MindONE 0.3.0 is built on MindSpore2.5 and optimized for Ascend NPUs, ensuring high-performance training for various generative models, such as opensora, cogvideox, and JanusPro from DeepSeek.
Key Features
- Support Diffusers v0.32.2
MindONE now supports the following new pipelines for image and video generation, along with new training scripts:
-
Video Generation Pipelines: CogVideoX, Latte, Mochi-1, Allegro, LTXVideo, HunyuanVideo, and more.
-
Image Generation Pipelines: Cogview3/4, Stable Diffusion 3.5, CogView3, Flux, SANA, Lumina, Kolors, AuraFlow, and more.
-
Training Scripts: CogvideoX SFT & LoRA, Flux SFT & LoRA & ControlNet, and SD3/3.5 SFT & LoRA.
For more details, visit the diffusers documentation.
- Expanded Multi-Modal Generative Models
MindONE v0.3.0 adds various state-of-the-art generative models as examples, ensuring efficient training performance on Ascend NPUs, including:
task | model | inference | finetune | pretrain | institute |
---|---|---|---|---|---|
Image-to-Video | hunyuanvideo-i2v 🔥🔥 | ✅ | ✖️ | ✖️ | Tencent |
Text/Image-to-Video | wan2.1 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
Text-to-Image | cogview4 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Zhipuai |
Text-to-Video | step_video_t2v 🔥🔥 | ✅ | ✖️ | ✖️ | StepFun |
Image-Text-to-Text | qwen2_vl 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
Any-to-Any | janus 🔥🔥🔥 | ✅ | ✅ | ✅ | DeepSeek |
Any-to-Any | emu3 🔥🔥 | ✅ | ✅ | ✅ | BAAI |
Class-to-Image | var🔥🔥 | ✅ | ✅ | ✅ | ByteDance |
Text/Image-to-Video | hpcai open 2.0🔥🔥 | ✅ | ✖️ | ✖️ | HPC-AI Tech |
Text/Image-to-Video | cogvideox 1.5 5B~30B 🔥🔥 | ✅ | ✅ | ✅ | Zhipu |
Text-to-Video | open sora plan 1.3🔥🔥 | ✅ | ✅ | ✅ | PKU |
Text-to-Video | hunyuanvideo🔥🔥 | ✅ | ✅ | ✅ | Tencent |
Text-to-Video | movie gen 30B🔥🔥 | ✅ | ✅ | ✅ | Meta |
Video-Encode-Decode | magvit | ✅ | ✅ | ✅ | |
Text-to-Image | story_diffusion | ✅ | ✖️ | ✖️ | ByteDance |
Image-to-Video | dynamicrafter | ✅ | ✖️ | ✖️ | Tencent |
Video-to-Video | venhancer | ✅ | ✖️ | ✖️ | Shanghai AI Lab |
Text-to-Video | t2v_turbo | ✅ | ✅ | ✅ | |
Text/Image-to-Video | video composer | ✅ | ✅ | ✅ | Alibaba |
Text-to-Image | flux 🔥 | ✅ | ✅ | ✖️ | Black Forest Lab |
Text-to-Image | stable diffusion 3 🔥 | ✅ | ✅ | ✖️ | Stability AI |
Text-to-Image | kohya_sd_scripts | ✅ | ✅ | ✖️ | kohya |
Text-to-Image | t2i-adapter | ✅ | ✅ | ✅ | Shanghai AI Lab |
Text-to-Image | ip adapter | ✅ | ✅ | ✅ | Tencent |
Text-to-3D | mvdream | ✅ | ✅ | ✅ | ByteDance |
Image-to-3D | instantmesh | ✅ | ✅ | ✅ | Tencent |
Image-to-3D | sv3d | ✅ | ✅ | ✅ | Stability AI |
Text/Image-to-3D | hunyuan3d-1.0 | ✅ | ✅ | ✅ | Tencent |
- Support Texto-to-Video Data Curation
MindONE v0.3.0 adds a new pipeline for text-to-video filtering, which supports scene detection and video splitting, de-duplication, aesthetic/ocr/lpips/nsfw scoring, and video captioning.
For more details, visit t2v curation documentation