Official repository for "Transformation driven Visual Reasoning".
Figure: Given the initial state and the final state, the target is to infer the intermediate transformation.
Transformation driven Visual Reasoning
Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng
Published on 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Motivation: Most existing visual reasoning tasks, such as CLEVR in VQA, are solely defined to test how well the machine understands the concepts and relations within static settings, like one image. We argue that this kind of state driven visual reasoning approach has limitations in reflecting whether the machine has the ability to infer the dynamics between different states, which has been shown as important as state-level reasoning for human cognition in Piaget’s theory.
Task: To tackle aforementioned problem, we propose a novel transformation driven visual reasoning task. Given both the initial and final states, the target is to infer the corresponding single-step or multi-step transformation.
If you find this code useful, please consider to star this repo and cite us:
@inproceedings{hongTransformationDrivenVisual2021d,
title = {Transformation {{Driven Visual Reasoning}}},
booktitle = {2021 {{IEEE}}/{{CVF Conference}} on {{Computer Vision}} and {{Pattern Recognition}} ({{CVPR}})},
author = {Hong, Xin and Lan, Yanyan and Pang, Liang and Guo, Jiafeng and Cheng, Xueqi},
year = {2021},
pages = {6899--6908}
}
We use docker to manage the environment. You need to build the docker image first and then enter the container to run the code.
0. Basic Setup
The host machine should have installed following packages (need sudo previlege to install):
And we also need to install docker-compose to manage the docker containers. Simply run the following command to install it:
pip install docker-compose
1. Start the docker container
python docker.py startd
Please follow the prompts to set the variables such as PROJECT
, DATA_ROOT
, LOG_ROOT
. After that, a .env
file will be generated in the root directory. You can also modify the variables in the .env
file directly.
DATA_ROOT
is mapped to /data
in the container. LOG_ROOT
is mapped to /log
in the container.
Tips: when you first start the container, it will take some time to build the image. After that, it will be much faster. If you want to rebuild the image, you can run:
python docker.py startd --build
2. Enter the container
python docker.py
1. Download TRANCE dataset
Follow the steps in this page to download TRNACE from Kaggle. After that, decompress the package under DATA_ROOT
(specified in .env
file). The final dataset location should be DATA_ROOT/trance
in the host machine and it will be mapped to /data/trance
in the container.
2. Preprocess the data
Preprocess the data with the following command:
python scripts/preprocess.py /data/trance
This will merge the raw images files and meta data into a single hdf5 file. After that, the directory should include the following files:
trance
├── data.h5
├── properties.json
└── values.json
After entering the container, you can run the following command to train a model:
python train.py experiment=event_cnn_concat logging.wandb.tags="[event, base]"
Or, you can training multiple models with available GPUs:
python scripts/batch_train.py scripts/training/train_models.sh --gpus 0,1,2,3
Please refer to the scripts under scripts/training
for full training commands.
Notice: We fixed a bug in TRANCE, therefore, the performance on Event and View is slightly higher (0.03~0.06 on Acc) than the results reported in our CVPR paper.
We provide a demo to explore the dataset and testing predictions of trained models.
1. Launch the api server
Enter the project container and run the api server:
python docker.py
uvicorn src.demo.api_server.main:app --host 0.0.0.0 --port 8000 --reload
Tips 1: you can check the api docs by visiting http://<host_ip>:8000/docs
in your browser. The host_ip
is the ip address of the host machine.
Tips 2: the default port is 8000. If you use another port, you need also to modify the port specified in src/demo/ui/src/js/api.js
.
2. Launch the web (UI) server
We need another docker container to launch the ui. Run the command in another terminal window in the host machine (recommend tmux):
python docker.py start --service demo
When you first start the container, besides the image building, it will also take some time to install the npm packages. After that, it will be much faster.
We provide the code to generate the dataset.
1. Build the docker image
python docker.py prepare --service blender --build
2. Enter the container
python docker.py --service blender
3. Generate the dataset
# with CPU
blender --background --python render.py -- --config configs/standard.yaml --gpu false --render_tile_size 16
# with GPU
CUDA_VISIBLE_DEVICES=0 blender --background --python render.py -- --config configs/standard.yaml --gpu true --n_sample 1
The speed of rendering can be affected by:
- GPU or CPU. Gererally, GPU is more faster than CPU, unless your CPU has many cores.
render_tile_size
. CPU prefers small tile size, while GPU prefers large tile size.- Balanced sampling. It has noting to do with blender rendering. However, sampling scene graph for rendering can also be time consuming.
The code is licensed under the MIT license and the TRANCE dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
Notice: Some materials are directly inherited from CLEVR which are licensed under BSD License. More details can be found in this document.
This is a project based on DeepCodebase template.