Skip to content

Commit e084100

Browse files
yashkantRishabhJain2018
authored andcommitted
Update visual chatbot to Python3 and PyTorch
1 parent f5db5a0 commit e084100

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+3478
-2080
lines changed

.gitignore

+17-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
1-
data/
1+
# Demo
2+
/data/
23
media/
4+
viscap/captioning/detectron/
5+
viscap/captioning/model_data/
6+
viscap/checkpoints/
7+
viscap/data/
8+
env/
9+
static/
310

411
*.pyc
512
db.sqlite3
@@ -9,3 +16,12 @@ ques_feat.json
916
models/*.caffemodel
1017
models/*.lua
1118
models/*.prototxt
19+
*.zip
20+
21+
# Pycharm
22+
.idea/
23+
24+
# Installed packages
25+
pytorch/
26+
migrations/
27+
!migrations/__init__.py

.gitmodules

+9-3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1-
[submodule "neuraltalk2"]
2-
path = neuraltalk2
3-
url = https://github.com/karpathy/neuraltalk2.git
1+
[submodule "viscap/captioning/vqa-maskrcnn-benchmark"]
2+
path = viscap/captioning/vqa-maskrcnn-benchmark
3+
url = https://gitlab.com/yashkant/vqa-maskrcnn-benchmark/
4+
[submodule "viscap/captioning/fastText"]
5+
path = viscap/captioning/fastText
6+
url = https://github.com/facebookresearch/fastText
7+
[submodule "viscap/captioning/pythia"]
8+
path = viscap/captioning/pythia
9+
url = https://github.com/facebookresearch/pythia/

README.md

+116-79
Original file line numberDiff line numberDiff line change
@@ -1,120 +1,134 @@
1-
# Visual Chatbot
21

3-
## Introduction
2+
Visual Chatbot
3+
============
4+
Demo for the paper (**Now upgraded to Pytorch, for the Lua-Torch version see [tag]()**).
45

5-
Demo for the paper
6-
7-
**[Visual Dialog][1]**
6+
**[Visual Dialog][1]** (CVPR 2017 [Spotlight][4]) </br>
87
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
9-
[arxiv.org/abs/1611.08669][1]
10-
[CVPR 2017][4] (Spotlight)
11-
8+
Arxiv Link: [arxiv.org/abs/1611.08669][1]
129
Live demo: http://visualchatbot.cloudcv.org
1310

11+
[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")
12+
13+
Introduction
14+
---------------
1415
**Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’!
1516

16-
[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")
17+
What has changed since the last version?
18+
---------------------------------------------------
19+
The model-building code is completely shifted to Pytorch, we have put in a much improved [Bottom Up Top Down][12] captioning model from [Pythia][10] and Mask-RCNN feature extractor from [maskrcnn-benchmark][13]. The Visdial model is borrowed from [visdial-challenge-starter][14] code.
1720

18-
## Installation Instructions
21+
Please follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [tag]().
1922

20-
### Installing the Essential requirements
23+
Setup and Dependencies
24+
------------------------------
25+
Start with installing the Build Essentials , [Redis Server][5] and [RabbiMQ Server][6].
26+
```sh
27+
sudo apt-get update
2128

22-
```shell
29+
# download and install build essentials
2330
sudo apt-get install -y git python-pip python-dev
24-
sudo apt-get install -y python-dev
25-
sudo apt-get install -y autoconf automake libtool curl make g++ unzip
31+
sudo apt-get install -y autoconf automake libtool
2632
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
27-
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
28-
```
29-
30-
### Install Torch
31-
32-
```shell
33-
git clone https://github.com/torch/distro.git ~/torch --recursive
34-
cd ~/torch; bash install-deps;
35-
./install.sh
36-
source ~/.bashrc
37-
```
38-
39-
### Install PyTorch(Python Lua Wrapper)
40-
41-
```shell
42-
git clone https://github.com/hughperkins/pytorch.git
43-
cd pytorch
44-
source ~/torch/install/bin/torch-activate
45-
./build.sh
46-
```
33+
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
4734

48-
### Install RabbitMQ and Redis Server
49-
50-
```shell
35+
# download and install redis-server and rabbitmq-server
5136
sudo apt-get install -y redis-server rabbitmq-server
5237
sudo rabbitmq-plugins enable rabbitmq_management
5338
sudo service rabbitmq-server restart
5439
sudo service redis-server restart
5540
```
5641

57-
### Lua dependencies
58-
59-
```shell
60-
luarocks install loadcaffe
61-
```
42+
#### Environment Setup
6243

63-
The below two dependencies are only required if you are going to use GPU
44+
You can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [downloads page][17] and proceed below.
6445

65-
```shell
66-
luarocks install cudnn
67-
luarocks install cunn
68-
```
6946

70-
### Cuda Installation
47+
```sh
48+
# clone and download submodules
49+
git clone https://www.github.com/yashkant/visual-chatbot.git
50+
git submodule update init --recursive
7151

72-
Note: CUDA and cuDNN is only required if you are going to use GPU
52+
# create and activate new environment
53+
conda create -n vischat python=3.6.8
54+
conda activate vischat
7355

74-
Download and install CUDA and cuDNN from [nvidia website](https://developer.nvidia.com/cuda-downloads)
75-
76-
### Install dependencies
77-
78-
```shell
79-
git clone https://github.com/Cloud-CV/visual-chatbot.git
80-
cd visual-chatbot
81-
git submodule init && git submodule update
82-
sh models/download_models.sh
56+
# install the requirements of chatbot and visdial-starter code
57+
cd visual-chatbot/
8358
pip install -r requirements.txt
8459
```
8560

86-
If you have not used nltk before, you will need to download a tokenization model.
61+
#### Downloads
62+
Download the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files.
63+
```sh
64+
sh viscap/download_models.sh
65+
```
8766

88-
```shell
89-
python -m nltk.downloader punkt
67+
#### Install Submodules
68+
Install Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction.
69+
```sh
70+
# install fastText (dependency of pythia)
71+
cd viscap/captioning/fastText
72+
pip install -e .
73+
74+
# install pythia for using butd model
75+
cd ../pythia/
76+
sed -i '/torch/d' requirements.txt
77+
pip install -e .
78+
79+
# install maskrcnn-benchmark for feature extraction
80+
cd ../vqa-maskrcnn-benchmark/
81+
python setup.py build
82+
python setup.py develop
83+
cd ../../../
9084
```
85+
#### Cuda Installation
9186

92-
Change lines 2-4 of `neuraltalk2/misc/LanguageModel.lua` to the following:
87+
Note: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [nvidia website][18].
9388

94-
```shell
95-
local utils = require 'neuraltalk2.misc.utils'
96-
local net_utils = require 'neuraltalk2.misc.net_utils'
97-
local LSTM = require 'neuraltalk2.misc.LSTM'
89+
#### NLTK
90+
We use `PunktSentenceTokenizer` from nltk, download it if you haven't already.
91+
```sh
92+
python -c "import nltk; nltk.download('punkt')"
9893
```
9994

100-
### Create the database
10195

102-
```shell
96+
## Let's run this now!
97+
#### Setup the database
98+
```
99+
# create the database
103100
python manage.py makemigrations chat
104101
python manage.py migrate
105102
```
103+
#### Run server and worker
104+
Launch two separate terminals and run worker and server code.
105+
```sh
106+
# run rabbitmq worker on first terminal
107+
# warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing
108+
python worker_viscap.py
109+
110+
# run development server on second terminal
111+
python manage.py runserver
112+
```
113+
You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
106114

107-
### Running the RabbitMQ workers and Development Server
115+
## Issues
116+
If you run into incompatibility issues, please take a look [here][7] and [here][8].
108117

109-
Open 3 different terminal sessions and run the following commands:
118+
## Model Checkpoint and Features Used
119+
Performance on `v1.0 test-std` (trained on `v1.0` train + val):
110120

111-
```shell
112-
python worker.py
113-
python worker_captioning.py
114-
python manage.py runserver
115-
```
121+
Model | R@1 | R@5 | R@10 | MeanR | MRR | NDCG |
122+
------- | ------ | ------ | ------ | ------ | ------ | ------ |
123+
[lf-gen-mask-rcnn-x101-demo][20] | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 |
116124

117-
You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
125+
Extracted features from `VisDial v1.0` used to train the above model are here:
126+
127+
- [features_mask_rcnn_x101_train.h5][21]: Mask-RCNN features with 100 proposals per image train split.
128+
- [features_mask_rcnn_x101_val.h5][22]: Mask-RCNN features with 100 proposals per image val split.
129+
- [features_mask_rcnn_x101_test.h5][23]: Mask-RCNN features with 100 proposals per image test split.
130+
131+
*Note*: Above features have key `image_id` (from earlier versions) renamed as `image_ids`.
118132

119133
## Cite this work
120134

@@ -131,20 +145,43 @@ If you find this code useful, consider citing our work:
131145
```
132146

133147
## Contributors
134-
148+
* [Yash Kant][19] ([email protected])
135149
* [Deshraj Yadav][2] ([email protected])
136150
* [Abhishek Das][3] ([email protected])
137151

138152
## License
139153

140154
BSD
141155

142-
## Credits
156+
## Credits and Acknowledgements
143157

144158
- Visual Chatbot Image: "[Robot-clip-art-book-covers-feJCV3-clipart](https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png)" by [Wikimedia Commons](https://commons.wikimedia.org) is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
145-
159+
- The beam-search implementation was borrowed as it is from [AllenNLP](15).
160+
- The vqa-maskrcnn-benchmark code used was forked from @meetshah1995's [fork](16) of the original repository.
161+
- The VisDial model is borrowed from [visdial-starter-challenge ][14].
162+
- The BUTD captioning model comes from this awesome repository [Pythia][10].
146163

147164
[1]: https://arxiv.org/abs/1611.08669
148165
[2]: http://deshraj.github.io
149166
[3]: https://abhishekdas.com
150167
[4]: http://cvpr2017.thecvf.com/
168+
[5]: https://redis.io/
169+
[6]: https://www.rabbitmq.com/
170+
[7]: https://github.com/unbit/uwsgi/issues/1770
171+
[8]: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer
172+
[9]: https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark)
173+
[10]: https://github.com/facebookresearch/pythia/
174+
[11]: https://github.com/facebookresearch/fastText/
175+
[12]: https://arxiv.org/abs/1707.07998
176+
[13]: https://github.com/facebookresearch/maskrcnn-benchmark
177+
[14]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/
178+
[15]: https://www.github.com/allenai/allennlp
179+
[16]: https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/
180+
[17]: https://conda.io/docs/user-guide/install/download.html
181+
[18]: https://developer.nvidia.com/cuda-downloads
182+
[19]: https://github.com/yashkant
183+
[20]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth
184+
[21]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5
185+
[22]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5
186+
[23]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5
187+

captioning.lua

-100
This file was deleted.

0 commit comments

Comments
 (0)