🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

Updates

2024/7/27: We provide code and workflow for deploying CatVTON on ComfyUI 💥.
2024/7/24: Our Paper on ArXiv is available 🥳!
2024/7/22: Our App Code is released, deploy and enjoy CatVTON on your own mechine 🎉!
2024/7/21: Our Inference Code and Weights 🤗 are released.
2024/7/11: Our Online Demo is released 😁.

Installation

An Installation Guide is provided to help build the conda environment for CatVTON. When deploying the app, you will need Detectron2 & DensePose, but these are not required for inference on datasets. Install the packages according to your needs.

Deployment

ComfyUI Workflow

We have modified the main code to enable easy deployment of CatVTON on ComfyUI. Due to the incompatibility of the code structure, we have released this part in the Releases, which includes the code placed under custom_nodes of ComfyUI and our workflow JSON files.

To deploy CatVTON to your ComfyUI, follow these steps:

Install all the requirements for both CatVTON and ComfyUI, refer to Installation Guide for CatVTON and Installation Guide for ComfyUI.
Download ComfyUI-CatVTON.zip and unzip it in the custom_nodes folder under your ComfyUI project (clone from ComfyUI).
Run the ComfyUI.
Download catvton_workflow.json and drag it into you ComfyUI webpage and enjoy 😆!

Problems under Windows OS, please refer to issue#8.

When you run the CatVTON workflow for the first time, the weight files will be automatically downloaded, which usually takes dozens of minutes.

Gradio App

To deploy the Gradio App for CatVTON on your own mechine, just run the following command, and checkpoints will be automaticly download from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32

When using bf16 precision, generating results with a resolution of 1024x768 only requires about 8G VRAM.

Inference

Data Preparation

Before inference, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:

├── VITON-HD
|   ├── test_pairs_unpaired.txt
│   ├── test
|   |   ├── image
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── agnostic-mask
│   │   │   ├── [000006_00_mask.png | 000008_00.png | ...]
...

For DressCode dataset, we provide our preprocessed agnostic masks, download and place in agnostic_masks folders under each category.

├── DressCode
|   ├── test_pairs_paired.txt
|   ├── test_pairs_unpaired.txt
│   ├── [dresses | lower_body | upper_body]
|   |   ├── test_pairs_paired.txt
|   |   ├── test_pairs_unpaired.txt
│   │   ├── images
│   │   │   ├── [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
│   │   ├── agnostic_masks
│   │   │   ├── [013563_0.png| 013564_0.png | ...]
...

Inference on VTIONHD/DressCode

To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automatically download from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path <path> \
--output_dir <path> 
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair

Acknowledgement

Our code is modified based on Diffusers. We adopt Stable Diffusion v1.5 inpainting as base model. We use SCHP and DensePose to automatically generate masks in our Gradio App and ComfyUI workflow. Thanks to all the contributors!

License

All the materials, including code, checkpoints, and demo, are made available under the Creative Commons BY-NC-SA 4.0 license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.

Citation

@misc{chong2024catvtonconcatenationneedvirtual,
      title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 
      author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
      year={2024},
      eprint={2407.15886},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.15886}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
__pycache__		__pycache__
model		model
resource		resource
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
index.html		index.html
inference.py		inference.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

Updates

Installation

Deployment

ComfyUI Workflow

Gradio App

Inference

Data Preparation

Inference on VTIONHD/DressCode

Acknowledgement

License

Citation

About

Releases

Packages

Languages

License

trikim/CatVTON

Folders and files

Latest commit

History

Repository files navigation

🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

Updates

Installation

Deployment

ComfyUI Workflow

Gradio App

Inference

Data Preparation

Inference on VTIONHD/DressCode

Acknowledgement

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages