Inference

We compare our Facere models with the state-of-the-art Attribute Mask R-CNN (amrcnn) and FashionFormer (fformer) models. Since their repositories have conflicting dependencies, we create separate virtual environments for each of them. Here are all the models we used:

Model name	Description	Backbone	`FashionFail-train` data	download
`amrcnn-spine`	Attribute Mask-RCNN model released with Fashionpedia paper.	SpineNet-143	x	ckpt \| config
`fformer-swin`	Fashionformer model released by Fashionformer paper.	Swin-base	x	pth
`amrcnn-r50-fpn`	Attribute Mask-RCNN model released with Fashionpedia paper.	ResNet50-FPN	x	ckpt \| config
`fformer-r50-fpn`	Fashionformer model released by Fashionformer paper.	ResNet50-FPN	x	pth
`facere`	Mask R-CNN based model trained on `Fashionpedia-train`.	ResNet50-FPN	x	onnx
`facere+`	`facere` model finetuned on `FashionFail-train`.	ResNet50-FPN	✔	onnx

1. Inference on `Facere` models

(After training) Execute the following command to convert the trained model (.ckpt) to .onnx format:

python models/export_to_onnx.py \
--ckpt_path "model.ckpt" \
--onnx_path "model.onnx" \
--model_class "facere_base"  # either "facere_base" or "facere_plus"

Then, run inference using ONNX Runtime with:

python models/predict_models.py \
--model_name "facere_base" \  # either "facere_base" or "facere_plus"
--image_dir "path/to/images/to/run/inference/for/" \
--out_dir "path/to/where/predictions/will/be/saved/"

which saves all the predictions into a single compressed .npz file, which is storage-efficient. The file has the following structure:

{
    "image_file": str,        # image file name
    "boxes": numpy.ndarray,   # boxes in yxyx format (same as `amrcnn` model output)
    "classes": numpy.ndarray, # classes/categories in [1,n] for n classes
    "scores": numpy.ndarray,  # confidence scores of boxes in [0,1]
    "masks": list(dict),      # segmentation masks in encoded RLE format
}

Alternatively, see the inference code in HuggingFace Spaces.

2. Inference on Attribute Mask R-CNN [paper] [code]

Note on the repository: The whole repository is really complex and not easily editable, e.g. I couldn't run inference on GPUs, failed to convert the model to .onnx format, etc. Therefore, the following procedure is not optimal, but it works...

Create and activate the conda environment:

conda create -n amrcnn python=3.9
conda activate amrcnn

Install dependencies:

pip install tensorflow-gpu==2.11.0 Pillow==9.5.0 pyyaml opencv-python-headless tqdm pycocotools

Clone the repository, navigate to the detection directory and download the models:

cd /change/dir/to/fashionfail/repo/
git clone https://github.com/jangop/tpu.git
cd tpu
git checkout 85b65b6
cd models/official/detection
curl https://storage.googleapis.com/cloud-tpu-checkpoints/detection/projects/fashionpedia/fashionpedia-spinenet-143.tar.gz --output fashionpedia-spinenet-143.tar.gz
tar -xf fashionpedia-spinenet-143.tar.gz
curl https://storage.googleapis.com/cloud-tpu-checkpoints/detection/projects/fashionpedia/fashionpedia-r50-fpn.tar.gz
tar -xf fashionpedia-r50-fpn.tar.gz

The inference script expects a .zip file for the input images. Hence, zip the FashionFail-test data, for example:

cd ~/.cache/fashionfail/
tar -cvf ff_test.tar images/test/*

Finally, we can run inference with:

cd some_path/fashionfail/tpu/models/official/detection
python inference_fashion.py \
    --model="attribute_mask_rcnn" \
    --config_file="projects/fashionpedia/configs/yaml/spinenet143_amrcnn.yaml" \
    --checkpoint_path="fashionpedia-spinenet-143/model.ckpt" \
    --label_map_file="projects/fashionpedia/dataset/fashionpedia_label_map.csv" \
    --output_html="out.html" --max_boxes_to_draw=8 --min_score_threshold=0.01 \
    --image_size="640" \
    --image_file_pattern="~/.cache/fashionfail/ff_test.tar" \
    --output_file="outputs/spinenet143-ff_test.npy"

The predictions file has the following structure:

{
    'image_file': str,        # image file name
    'boxes': np.ndarray,      # boxes in yxyx format
    'classes': np.ndarray,    # classes/categories in [1,n] for n classes
    'scores': np.ndarray,     # confidence scores of boxes in [0,1]
    'attributes': np.ndarray, # attributes (not used in our evaluation)
    'masks': encoded_masks,   # segmentation masks in encoded RLE format
}

3. Inference on FashionFormer [paper] [code]

Create and activate the conda environment:

conda create -n fformer python==3.8.13
conda activate fformer

Install dependencies:

conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 -c pytorch
pip install -U openmim
mim install mmdet==2.18.0
mim install mmcv-full==1.3.18
pip install git+https://github.com/cocodataset/panopticapi.git
pip install -U scikit-learn
pip install -U scikit-image
pip install torchmetrics

Clone the repository and create a new directory for the model weights:

cd /change/dir/to/fashionfail/repo/
git clone https://github.com/xushilin1/FashionFormer.git
mkdir FashionFormer/ckpts

Download the models manually from OneDrive and place them inside the newly created FashionFormer/ckpts folder.

Then, run inference with:

python src/fashionfail/models/predict_fformer.py \
--model_path "./FashionFormer/ckpts/fashionformer_r50_3x.pth" \
--config_path  "./FashionFormer/configs/fashionformer/fashionpedia/fashionformer_r50_mlvl_feat_3x.py"\
--out_dir "path/to/where/predictions/will/be/saved/" \
--image_dir "./cache/fashionfail/images/test/" \
--dataset_name "ff_test" \
--score_threshold 0.05

which saves all the predictions into a single compressed .npz file, which is storage-efficient.

Note: A score_threshold=0.05 is applied to model predictions. This is because the fformer outputs a fixed number (100) of predictions for each input due to its Transformer architecture, resulting in many unconfident and mainly wrong predictions, which can lead to poor results. Therefore, this thresholding is applied to evaluate the model's performance fairly.

The predictions file has the following structure:

{
    "image_file": str,        # image file name
    "boxes": numpy.ndarray,   # boxes in xyxy format
    "classes": numpy.ndarray, # classes/categories in [0,n-1] for n classes
    "scores": numpy.ndarray,  # confidence scores of boxes in [0,1]
    "masks": list(dict),      # segmentation masks in encoded RLE format
}

Back

Next: Evaluation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04_inference.md

04_inference.md

Inference

1. Inference on `Facere` models

2. Inference on Attribute Mask R-CNN [paper] [code]

3. Inference on FashionFormer [paper] [code]

Files

04_inference.md

Latest commit

History

04_inference.md

File metadata and controls

Inference

1. Inference on Facere models

2. Inference on Attribute Mask R-CNN [paper] [code]

3. Inference on FashionFormer [paper] [code]

1. Inference on `Facere` models