This repo implements a simple PyTorch codebase for training inpainting models with powerful tools including Docker, PyTorchLightning, and Hydra.
Currently, only DFNet is supported. More methods as well as some additional useful utilities for image inpainting will be implemented.
We use docker to run all experiemnts.
- PytorchLightning
- logging (tensorboard, csv)
- checkpoint
- DistributedDataParallel
- mixed-precision
- Hydra
- flexible configuration system
- logging (stream to file, folder structure)
- Others
- save sample results
Build the image for the first time:
python core.py env prepare
Explaination:
- When you first run this command, you will be asked to give three items:
- a project name,
- the root folder of your train log,
- the root folder of your datasets,
- then an image is built based on
/env/Dockerfile
, - and at last, a container is launched based on
docker-compose.yml
The defualt setting of docker-compose.yml
is shown as below, you can modify this setting before building accordingly:
version: "3.9"
services:
lab:
container_name: ${PROJECT}
runtime: nvidia
build:
context: env/
dockerfile: Dockerfile
args:
- USER_ID=${UID}
- GROUP_ID=${GID}
- USER_NAME=${USER_NAME}
image: pytorch181_local
environment:
- TZ=Asia/Shanghai
- TORCH_HOME=/data/torch_model
ipc: host
hostname: docker
working_dir: /code
command: ['sleep', 'infinity']
volumes:
- ${CODE_ROOT}:/code
- ${DATA_ROOT}:/data
- ${LOG_ROOT}:/outputs
Simply run:
python core.py env
The default user is the same as the host to avoid permission issues. And of course you can enter the container with root:
python core.py env --root
Basiclly, the environment are determined by four items:
/env/Dockerfile
defines the logic of building the local docker image. For example, installing packages defined inrequirements.txt
based ondeepbase/pytorch:latest
.- Base docker image. From
/env/Dockerfile
, you can finddeepbase/pytorch
is the base image. The original Dockerfile of the base image is at deepcodebase/docker. You can change the base image as whatever you like. /env/requirements.txt
defines the python packages you want to install in the local docker image./docker-compose.yml
defines the setting of running the container. For example, the volumes, timezone, etc.
After changing the settings as you want at anytime, you can rebuild the local image by running:
python core.py env prepare --build
- Image data: any image data you like. e.g. Places2, ImageNet, etc. Place your dataset into your
DATAROOT
in your local machine (mapped to docker's/data
). For example:DATAROOT/places2
is used for training by default. - Masks: you can download and use free-form-mask. Decompress the file and place
mask
underDATAROOT
. - By default, inside the environment, you need to have
places2
andmask
under/data
. - If you use other datasets, remember to modify the settings especially the data location under
conf/dataset
.
After entering the environment, you can launch training. Example training commands:
python train.py
python train.py mode=run pl_trainer.gpus=\'3,4\' logging.wandb.notes="tune model"
python train.py +experiment=k80 mode=run logging.wandb.tags='[k80]'
This project use wandb for logging by default, it will prompt if you run training the first time:
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice:
just follow the steps and wandb is convenient and easy use. If you wan't to use tensorboard instead, just add the flag when running:
python train.py logging=tensorboard
Reading the offical documents of Hydra and PyTorchLightning to know more about:
- Hydra: Very powerful and convenient configuration system and more.
- PyTorchLightning: You almost only need to write codes for models and data. Say goodbye to massive codes for pipelines, mixed precision, logging, etc.
Training on Places2 with 20 epochs.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.