The goal of this project is to provide a Python-based framework for virtual try-on tasks. This repo is part of a bigger project and together with Pixit Fashion Training it can be used to (1) prepare a virtual try-on dataset, (2) train an outfit-specific, fine-tuned virtual try-on model, and (3) use the model from step 2 in an inference pipeline that uses the very same pre-processing steps from step 1.
Pixit Fashion: Can be used for pre-processing a virtual try-on data set + inference using the very same pre-processing steps from training
Pixit Fashion Training: Can be used for training an outfit-specific, fine-tuned virtual try-on model
Virtual try-on tasks focus on transferring clothing items - such as t-shirts, socks, bras, or trousers - from flatlay images (i.e. images that show the product itself; usually on a white background) onto on-model images (i.e. images that show a model wearing a specific outfit from a specific brand; usually either on a neutral background such as white or gray-ish or as a mood where the model is placed in an appealing setting). Latter can also be AI-generated. Technically, the workflow always requires (a) a flatlay image and (b) a reference on-model image (see examples below). The workflow then proceeds as follows:
- Extract a mask around the target area in the reference on-model image that is to be modified (for example, isolating the black trousers worn by the model).
- Supply both the flatlay image (e.g., displaying leo trousers) and the reference on-model image, along with its corresponding mask from step 1, to a specialized virtual try-on model
- Ask the specialized virtual try-on model to replace the outfit in the on-model image (e.g. the black trousers) highlighted by the mask from step 1 with the product from the flatlay image (e.g. the leo trousers). See results below.
| Flatlay Image | Reference On-Model Image | Generated On-Model Image |
|---|---|---|
![]() |
![]() |
![]() |
This repository serves two main purposes:
(a) Training Data Preparation: Scripts for preparing the fashion datasets such that we can train our own models. Preprocessing steps include
- Finding matching flatlay / on-model image pairs
- (Automatically) labeling bad training images
- Removing bad training images
- Prompting
- Masking
- Cropping
(b) Flatlay / On-Model Inference:
Process flatlay and on-model image pairs for inference - identical to preprocessing steps for training - through:
- Prompting
- Masking
- Cropping
- Python: 3.12 (recommended)
- Deep Learning: CUDA-enabled GPU (strongly recommended)
- Operating System: Linux or macOS
# Clone the main repository
git clone https://github.com/jroessler/pixit-fashion.git
cd pixit-fashion
# Initialize the GroundingDino submodule
git submodule update --init --recursive
# Create and activate a Python virtual environment
python3 -m venv pixit-fashion
source pixit-fashion/bin/activate
# Install PyTorch (important: has to be compatible with cuda drivers)
pip install --index-url https://download.pytorch.org/whl/cu128 torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128
# Install project dependencies
pip install -r requirements.txt
pip install flash-attn==2.8.3 --config-settings=--build-isolation=false
# Install GroundingDino as an editable pip package
cd external/GroundingDino/
pip install --no-build-isolation -e .
cd ../../
# Install SAM2 as an editable pip package
cd external/sam2/
pip install --no-build-isolation -e .
cd ../../For some pre-processing steps, a ComfyUI installation might be necessary. Please create a separate virtual environment for comfyui and set it up accordinly. See here for more information.
Please create a .env file with cp example-env .env and adjust the paths. In particular:
ROOTCONSOLIDATED_COMFYUI_MODELSCOMFYUI_FOLDERCOMFYUI_ENV
All Python code must be formatted with Black. Run black . before pushing
For automatic formatting every time you save a file in VS Code, do the following:
- Press CMD + Shift + P (on Mac, or Ctrl + Shift + P on Windows/Linux) to open the Command Palette.
- Type and select Preferences: Open User Settings (JSON).
- Add the following snippet to your settings (this ensures Black is used as the formatter and lines wrap at 200 characters):
´´´
...
"[python]": {
...
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true
},
"black-formatter.args": [
"--line-length",
"200"
]
...
´´´
scripts
└── shared
│ └── cropping.py
│ └── masking.py
│ └── prompting.py
│ └── ...
└── preprocessing
│ └── crop_images.py
│ └── mask_generation.py
│ └── prompt_generation.py
│ └── ...
└── inference
│ └── preprocessing.py
│ └── generate_on_model.py
-
scripts/shared: Encapsulated pre-processing modules such as cropping, masking, and prompting, which can be used (a) for training data preparation and / or (b) inference preparation. The idea is to apply the same set of pre-processing steps and functions not only in training but also in inference. Some of the modules contain various classes. For example, you can either use (1) RandomResize, (2) OutfitCropper (default), or (3) CenterCropper as a cropping mechanism. -
scripts/preprocessing: Wrapper files that can be used for training data preparation. These files iterate over images and apply the preprocessing steps using the modules fromscripts/shared -
scripts/inference: Wrapper files that can be used for inference (+ inference preparation). These files can be used for inference to (1) preprocess a given image pair and (b) make a prediction, that is, generate an image.
We assume that the images have to be prepared. Mandatory steps:
This script finds matching image pairs (by regex) and copies them to a different location (just to double save everything)
Assumption: In the provided data_root_path there are two folders: product and model, which contain the flatlay and on-model images, respectively. Both folders contain images with the following naming schema: IDENTIFIER_CODE.EXT where IDENTIFIER is a random, unique identifier, CODE is either 1 (flatlay) or 2 (on-model), and EXT is a valid image extension. A corresponding image pair might look like this: product/1234_1.png and model/1234_2.png. When finding matching image pairs, the script stores flatlay and on-model images in destination_path/flatlay and destination_path/on-model, respectively.
python -m scripts.preprocessing.find_matching_images --data_root_path="" --destination_path="" --dry_runThis script, (a) generates prompts for flatlay images (non-detailed and detailed) and on-model images as well as (b) marks "bad" images. Bad images are defined as
- Flatlay images that contain multiple items, "multi-item"-images
- Image pairs that have different perspectives (e.g., front view on the on-model image and back view on the flatlay image)
Assumption: In the provided data_root_path there are two folders: flatlay and on-model, which contain the flatlay and on-model images, respectively. The script will use a prompting mechanism (see scripts/shared/prompting.py for the different prompting classes) to create a prompts/prompts.json file that looks like this
{
"IDENTIFIER": {
"flatlay_prompt": string,
"flatlay_detailed_prompt": string,
"same_perspective": bool,
"flatlay_single_item": bool
},
...
}Note: Additionally, you might use a specialized UI to mark bad images (see pixit-fashion-analysis-ui). In case you are using this, the UI will create scoring/analysis.json file that will be used in move_bad_images.py.
python -m scripts.preprocessing.prompt_generation --data_root_path=""This script moves bad images from flatlay and on-model folder to bad/flatlay and bad/on-model folder, respectively - but only if they are marked as bad. Good images will stay in flatlay and on-model folder. It uses both (a) manually labeled bad images (via pixit-fashion-analysis-ui) and (b) automatically labeled bad image (via prompt_generation.py). Latter can also be turned off by providing the argument --ignore_automatic
python -m scripts.preprocessing.move_bad_images --data_root_path="" --dry_runThis scripts generates (a) masks for the on-model images and (b) overlays the masks on top of the on-model images.
Assumptions:
- The
outfit_categorymust be one of the following"pants","core-tops", or"socks"because for each outfit category we have to built a masking schemas. For now, we have only implemented the masking schemas for these categories. - In the provided
data_root_paththere is anon-modelfolder that contains the on-model images.
python -m scripts.preprocessing.move_bad_images --data_root_path="" --outfit_category="core-tops"This scripts crops (a) flatlays, (b) on-model, (c) on-model-masks, and (d) overlaid-masks to the provided shape. Ideally, the shape should be the one you want to use for training such as 768x1024 or 512x768 (although it is easy to resize the images during training!)
Assumption: In the provided data_root_path there are four folders: flatlay, on-model, on-model-mask, and overlaid-mask, which contain the flatlay, on-model, on-model-mask, and overlaid-mask images, respectively. For all the
python -m scripts.preprocessing.crop_images --data_root_path="" --width=768 --height=1024This scripts consolidates the imgages from flatlay, on-model, on-model-mask, overlaid-mask and their cropped versions as well as the prompts across different product categories into a single data folder. This might be necessary in cases where there are different subfolders (product categories) inside a master product category. Example folder structure when consolidate_images is necessary:
category
├── subcategory_1
│ └── flatlay
│ └── on-model
│ └── ...
└── subcategory_2
│ └── flatlay
│ └── on-model
│ └── ...
Example folder structure when consolidate_images is not necessary:
category
├── flatlay
└── on-model
...
python -m scripts.preprocessing.consolidate_images --data_root_path="" --dry_runHere’s how your data will be organized — each directory represents a step in the preprocessing and inference pipeline:
| Folder | Purpose |
|---|---|
flatlay/ |
(Raw) Flatlay images. |
flatlay-cropped/ |
Cropped Flatlay images |
generated-on-model/ |
Generated On-Model images |
on-model/ |
(Raw) On-Model images |
on-model-cropped/ |
Cropped On-Model images |
on-model-mask/ |
On-Model mask |
on-model-mask-cropped/ |
Cropped on-model masks |
overlaid-mask/ |
Overlaid on-model mask (debugging purposes) |
overlaid-mask-cropped/ |
Cropped overlaid on-model mask (debugging purposes) |
prompt/ |
Auto-generated prompts |
tmp/ |
Directory for debugging. |
bad/ |
Contains bad (corrupted) images from different steps |
To run preprocessing on a flatlay / on-model image pair:
python -m scripts.inference.preprocessing --user_uuid 1000 --job_uuid 100 --flatlay_image_path "" --on_model_image_path "" -outfit_category "pants"For all arguments, see file.
To run preprocessing on a flatlay / on-model image pair + it's AI on-model generation
python -m scripts.inference.generate_on_model --user_uuid 1000 --job_uuid 100 --flatlay_image_path "" --on_model_image_path "" -outfit_category "pants"For all arguments, see file.
First, start the API:
uvicorn app.main:app --host 0.0.0.0 --port 8287
Next, you can call the API, for example, with
curl -X POST "http://localhost:8287/fashion-preprocessing/preprocess/" -H "Content-Type: application/json" -d '{
"user_uuid": "1000",
"job_uuid": "1",
"on_model_image": "STR",
"flatlay_image": "STR",
"outfit_category": "pants",
"width": 1024,
"height": 1365,
"device": "cuda",
"verbose": true
}'

