Skip to content

Python-based framework that can be used for pre-processing a virtual try-on data set + inference using the very same pre-processing steps from training

License

Notifications You must be signed in to change notification settings

jroessler/pixit-fashion

Repository files navigation

Pixit Fashion

The goal of this project is to provide a Python-based framework for virtual try-on tasks. This repo is part of a bigger project and together with Pixit Fashion Training it can be used to (1) prepare a virtual try-on dataset, (2) train an outfit-specific, fine-tuned virtual try-on model, and (3) use the model from step 2 in an inference pipeline that uses the very same pre-processing steps from step 1.

Pixit Fashion: Can be used for pre-processing a virtual try-on data set + inference using the very same pre-processing steps from training
Pixit Fashion Training: Can be used for training an outfit-specific, fine-tuned virtual try-on model

Goal

Virtual try-on tasks focus on transferring clothing items - such as t-shirts, socks, bras, or trousers - from flatlay images (i.e. images that show the product itself; usually on a white background) onto on-model images (i.e. images that show a model wearing a specific outfit from a specific brand; usually either on a neutral background such as white or gray-ish or as a mood where the model is placed in an appealing setting). Latter can also be AI-generated. Technically, the workflow always requires (a) a flatlay image and (b) a reference on-model image (see examples below). The workflow then proceeds as follows:

  1. Extract a mask around the target area in the reference on-model image that is to be modified (for example, isolating the black trousers worn by the model).
  2. Supply both the flatlay image (e.g., displaying leo trousers) and the reference on-model image, along with its corresponding mask from step 1, to a specialized virtual try-on model
  3. Ask the specialized virtual try-on model to replace the outfit in the on-model image (e.g. the black trousers) highlighted by the mask from step 1 with the product from the flatlay image (e.g. the leo trousers). See results below.
Flatlay Image Reference On-Model Image Generated On-Model Image
Flatlay Image Reference On-Model Image Generated On-Model Image

Pixit-Fashion

This repository serves two main purposes:

(a) Training Data Preparation: Scripts for preparing the fashion datasets such that we can train our own models. Preprocessing steps include

  • Finding matching flatlay / on-model image pairs
  • (Automatically) labeling bad training images
  • Removing bad training images
  • Prompting
  • Masking
  • Cropping

(b) Flatlay / On-Model Inference:
Process flatlay and on-model image pairs for inference - identical to preprocessing steps for training - through:

  1. Prompting
  2. Masking
  3. Cropping

Setup

Requirements

  • Python: 3.12 (recommended)
  • Deep Learning: CUDA-enabled GPU (strongly recommended)
  • Operating System: Linux or macOS

Install

# Clone the main repository
git clone https://github.com/jroessler/pixit-fashion.git
cd pixit-fashion

# Initialize the GroundingDino submodule
git submodule update --init --recursive

# Create and activate a Python virtual environment
python3 -m venv pixit-fashion
source pixit-fashion/bin/activate

# Install PyTorch (important: has to be compatible with cuda drivers)
pip install --index-url https://download.pytorch.org/whl/cu128 torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128

# Install project dependencies
pip install -r requirements.txt
pip install flash-attn==2.8.3 --config-settings=--build-isolation=false

# Install GroundingDino as an editable pip package
cd external/GroundingDino/
pip install --no-build-isolation -e .
cd ../../

# Install SAM2 as an editable pip package
cd external/sam2/
pip install --no-build-isolation -e .
cd ../../

ComfyUI Install

For some pre-processing steps, a ComfyUI installation might be necessary. Please create a separate virtual environment for comfyui and set it up accordinly. See here for more information.

Setup Paths

Please create a .env file with cp example-env .env and adjust the paths. In particular:

  • ROOT
  • CONSOLIDATED_COMFYUI_MODELS
  • COMFYUI_FOLDER
  • COMFYUI_ENV

Black Formatter

All Python code must be formatted with Black. Run black . before pushing

For automatic formatting every time you save a file in VS Code, do the following:

  1. Press CMD + Shift + P (on Mac, or Ctrl + Shift + P on Windows/Linux) to open the Command Palette.
  2. Type and select Preferences: Open User Settings (JSON).
  3. Add the following snippet to your settings (this ensures Black is used as the formatter and lines wrap at 200 characters):

´´´

...
"[python]": {
    ...
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true
},
"black-formatter.args": [
    "--line-length",
    "200"
]
...

´´´

Overview / Structure

scripts
└── shared
│   └── cropping.py
│   └── masking.py
│   └── prompting.py
│   └── ...
└── preprocessing
│   └── crop_images.py
│   └── mask_generation.py
│   └── prompt_generation.py
│   └── ...
└── inference
│   └── preprocessing.py
│   └── generate_on_model.py
  • scripts/shared: Encapsulated pre-processing modules such as cropping, masking, and prompting, which can be used (a) for training data preparation and / or (b) inference preparation. The idea is to apply the same set of pre-processing steps and functions not only in training but also in inference. Some of the modules contain various classes. For example, you can either use (1) RandomResize, (2) OutfitCropper (default), or (3) CenterCropper as a cropping mechanism.

  • scripts/preprocessing: Wrapper files that can be used for training data preparation. These files iterate over images and apply the preprocessing steps using the modules from scripts/shared

  • scripts/inference: Wrapper files that can be used for inference (+ inference preparation). These files can be used for inference to (1) preprocess a given image pair and (b) make a prediction, that is, generate an image.

Training Data Preparation

From Crawling to Clean Data

We assume that the images have to be prepared. Mandatory steps:

1. Use find_matching_images.py

This script finds matching image pairs (by regex) and copies them to a different location (just to double save everything)

Assumption: In the provided data_root_path there are two folders: product and model, which contain the flatlay and on-model images, respectively. Both folders contain images with the following naming schema: IDENTIFIER_CODE.EXT where IDENTIFIER is a random, unique identifier, CODE is either 1 (flatlay) or 2 (on-model), and EXT is a valid image extension. A corresponding image pair might look like this: product/1234_1.png and model/1234_2.png. When finding matching image pairs, the script stores flatlay and on-model images in destination_path/flatlay and destination_path/on-model, respectively.

python -m scripts.preprocessing.find_matching_images --data_root_path="" --destination_path="" --dry_run

2. Use prompt_generation.py

This script, (a) generates prompts for flatlay images (non-detailed and detailed) and on-model images as well as (b) marks "bad" images. Bad images are defined as

  • Flatlay images that contain multiple items, "multi-item"-images
  • Image pairs that have different perspectives (e.g., front view on the on-model image and back view on the flatlay image)

Assumption: In the provided data_root_path there are two folders: flatlay and on-model, which contain the flatlay and on-model images, respectively. The script will use a prompting mechanism (see scripts/shared/prompting.py for the different prompting classes) to create a prompts/prompts.json file that looks like this

{
    "IDENTIFIER": {
        "flatlay_prompt": string,
        "flatlay_detailed_prompt": string,
        "same_perspective": bool,
        "flatlay_single_item": bool
    },
...
}

Note: Additionally, you might use a specialized UI to mark bad images (see pixit-fashion-analysis-ui). In case you are using this, the UI will create scoring/analysis.json file that will be used in move_bad_images.py.

python -m scripts.preprocessing.prompt_generation --data_root_path=""

3. Use move_bad_images.py

This script moves bad images from flatlay and on-model folder to bad/flatlay and bad/on-model folder, respectively - but only if they are marked as bad. Good images will stay in flatlay and on-model folder. It uses both (a) manually labeled bad images (via pixit-fashion-analysis-ui) and (b) automatically labeled bad image (via prompt_generation.py). Latter can also be turned off by providing the argument --ignore_automatic

python -m scripts.preprocessing.move_bad_images --data_root_path="" --dry_run

4. Use mask_generation.py

This scripts generates (a) masks for the on-model images and (b) overlays the masks on top of the on-model images.

Assumptions:

  • The outfit_category must be one of the following "pants", "core-tops", or "socks" because for each outfit category we have to built a masking schemas. For now, we have only implemented the masking schemas for these categories.
  • In the provided data_root_path there is an on-model folder that contains the on-model images.
python -m scripts.preprocessing.move_bad_images --data_root_path="" --outfit_category="core-tops"

5. Use crop_images.py

This scripts crops (a) flatlays, (b) on-model, (c) on-model-masks, and (d) overlaid-masks to the provided shape. Ideally, the shape should be the one you want to use for training such as 768x1024 or 512x768 (although it is easy to resize the images during training!)

Assumption: In the provided data_root_path there are four folders: flatlay, on-model, on-model-mask, and overlaid-mask, which contain the flatlay, on-model, on-model-mask, and overlaid-mask images, respectively. For all the

python -m scripts.preprocessing.crop_images --data_root_path="" --width=768 --height=1024

6. (Optional) consolidate_images.py

This scripts consolidates the imgages from flatlay, on-model, on-model-mask, overlaid-mask and their cropped versions as well as the prompts across different product categories into a single data folder. This might be necessary in cases where there are different subfolders (product categories) inside a master product category. Example folder structure when consolidate_images is necessary:

category
├── subcategory_1
│   └── flatlay
│   └── on-model
│   └── ...
└── subcategory_2
│   └── flatlay
│   └── on-model
│   └── ...

Example folder structure when consolidate_images is not necessary:

category
├── flatlay
└── on-model
...
python -m scripts.preprocessing.consolidate_images --data_root_path="" --dry_run

Flatlay / On-Model Inference

Data Folder Overview

Here’s how your data will be organized — each directory represents a step in the preprocessing and inference pipeline:

Folder Purpose
flatlay/ (Raw) Flatlay images.
flatlay-cropped/ Cropped Flatlay images
generated-on-model/ Generated On-Model images
on-model/ (Raw) On-Model images
on-model-cropped/ Cropped On-Model images
on-model-mask/ On-Model mask
on-model-mask-cropped/ Cropped on-model masks
overlaid-mask/ Overlaid on-model mask (debugging purposes)
overlaid-mask-cropped/ Cropped overlaid on-model mask (debugging purposes)
prompt/ Auto-generated prompts
tmp/ Directory for debugging.
bad/ Contains bad (corrupted) images from different steps

Quick Start: Preprocessing Single Flatlay / On-Model Image Pair

To run preprocessing on a flatlay / on-model image pair:

python -m scripts.inference.preprocessing --user_uuid 1000 --job_uuid 100 --flatlay_image_path "" --on_model_image_path "" -outfit_category "pants"

For all arguments, see file.

Quick Start: Preprocessing Single Flatlay / On-Model Image Pair + Generate On-Model Image

To run preprocessing on a flatlay / on-model image pair + it's AI on-model generation

python -m scripts.inference.generate_on_model --user_uuid 1000 --job_uuid 100 --flatlay_image_path "" --on_model_image_path "" -outfit_category "pants"

For all arguments, see file.

Flatlay / On-Model Inference API

First, start the API:

uvicorn app.main:app --host 0.0.0.0 --port 8287

Next, you can call the API, for example, with

curl -X POST "http://localhost:8287/fashion-preprocessing/preprocess/" -H "Content-Type: application/json" -d '{
        "user_uuid": "1000",
        "job_uuid": "1",
        "on_model_image": "STR",
        "flatlay_image": "STR",
        "outfit_category": "pants",
        "width": 1024,
        "height": 1365,
        "device": "cuda",
        "verbose": true
    }'

About

Python-based framework that can be used for pre-processing a virtual try-on data set + inference using the very same pre-processing steps from training

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published