Skip to content

jroessler/pixit-fashion-training

Repository files navigation

Pixit Fashion

The goal of this project is to provide a Python-based framework for virtual try-on tasks. This repo is part of a bigger project and together with Pixit Fashion it can be used to (1) prepare a virtual try-on dataset, (2) train an outfit-specific, fine-tuned virtual try-on model, and (3) use the model from step 2 in an inference pipeline that uses the very same pre-processing steps from step 1.

Pixit Fashion: Can be used for pre-processing a virtual try-on data set + inference using the very same pre-processing steps from training
Pixit Fashion Training: Can be used for training an outfit-specific, fine-tuned virtual try-on model

Goal

Virtual try-on tasks focus on transferring clothing items - such as t-shirts, socks, bras, or trousers - from flatlay images (i.e. images that show the product itself; usually on a white background) onto on-model images (i.e. images that show a model wearing a specific outfit from a specific brand; usually either on a neutral background such as white or gray-ish or as a mood where the model is placed in an appealing setting). Latter can also be AI-generated. Technically, the workflow always requires (a) a flatlay image and (b) a reference on-model image (see examples below). The workflow then proceeds as follows:

  1. Extract a mask around the target area in the reference on-model image that is to be modified (for example, isolating the black trousers worn by the model).
  2. Supply both the flatlay image (e.g., displaying leo trousers) and the reference on-model image, along with its corresponding mask from step 1, to a specialized virtual try-on model
  3. Ask the specialized virtual try-on model to replace the outfit in the on-model image (e.g. the black trousers) highlighted by the mask from step 1 with the product from the flatlay image (e.g. the leo trousers). See results below.
Flatlay Image Reference On-Model Image Generated On-Model Image
Flatlay Image Reference On-Model Image Generated On-Model Image

Pixit-Fashion Training

This repo can be used for training an outfit-specific, fine-tuned virtual try-on model. Mainly copied from CATVTON-FLUX. We made some modification to further improve results and enhance readability.

Setup

Requirements

  • VRAM:
    • Resolution: 768x512: >= 80GB GPU
    • Resolution: 1024x768: >= 95GB GPU
    • Resolution: 1280x1024: >= 125GB GPU

Install

# Clone the main repository
git clone https://github.com/jroessler/pixit-fashion-training.git
cd pixit-fashion-training

# Create and activate a Python virtual environment
python3 -m venv pixit-fashion-training
source pixit-fashion-training/bin/activate

# Install necessary dependencies
pip install -r requirements.txt

chmod +x train_flux_inpaint.sh 

Logins

huggingface-cli login # PROVIDE TOKEN
wandb login # PROVIDE API KEY

Accelerate

You can set up accelerate in two different ways

(a) Using an existing accelerate config

# Multi GPU
mkdir ~/.cache/huggingface/accelerate
cp accelerate-multi-gpu.yaml ~/.cache/huggingface/accelerate/default_config.yaml
# Note: When using Multi GPU, set gradient_accumulation_steps=1

# Single GPU
mkdir ~/.cache/huggingface/accelerate
cp accelerate_config_yaml ~/.cache/huggingface/accelerate/default_config.yaml

(b) Creating a new accelerate config

accelerate config

Black Formatter

All Python code must be formatted with Black. Run black . before pushing

For automatic formatting every time you save a file in VS Code, do the following:

  1. Press CMD + Shift + P (on Mac, or Ctrl + Shift + P on Windows/Linux) to open the Command Palette.
  2. Type and select Preferences: Open User Settings (JSON).
  3. Add the following snippet to your settings (this ensures Black is used as the formatter and lines wrap at 200 characters):

´´´

...
"[python]": {
    ...
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true
},
"black-formatter.args": [
    "--line-length",
    "200"
]
...

´´´

Data

Requires the following data structure:

data/category
├── flatlay-cropped/
└── on-model-cropped/
└── on-model-mask-cropped/
└── prompts/prompts.json
...

Assumption: In the provided data/category path there are four folders: flatlay-cropped, on-model-cropped, on-model-mask-cropped, and prompts, which contain the flatlay, on-model, on-model's masks, and their prompts, respectively. The first three folders contain images with the following naming schema: IDENTIFIER_CODE.EXT where IDENTIFIER is a random, unique identifier, CODE is either 1 (flatlay) or 2 (on-model), and EXT is a valid image extension. A corresponding image pair might look like this: flatlay-cropped/1234_1.png, on-model-cropped/1234_2.png, on-model-mask-cropped/1234_2.png. prompts.json should look like this:

{
    "IDENTIFIER": {
        "flatlay_prompt": string,
        "flatlay_detailed_prompt": string,
        "same_perspective": bool,
        "flatlay_single_item": bool
    },
...
}

See Pixit Fashion for the necessary data preparation steps.

Quick Start

Before starting the script, make sure

  1. that the data is in the correct format
  2. that you've provided IDENTIFIERS for testing purposes. See here
  3. that you've modified your train_flux_inpaint.sh file. More specifically, change parameters such as output_dir, dataroot, checkpointing_steps, and num_train_epochs
screen -S pixit-fashion-training
source pixit-fashion-training/bin/activate
./train_flux_inpaint.sh

About

Python-based framework that can be used for training an outfit-specific, fine-tuned virtual try-on model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published