1
- # Visual Chatbot
2
1
3
- ## Introduction
2
+ Visual Chatbot
3
+ ============
4
+ Demo for the paper (** Now upgraded to Pytorch, for the Lua-Torch version see [ tag] ( ) ** ).
4
5
5
- Demo for the paper
6
-
7
- ** [ Visual Dialog] [ 1 ] **
6
+ ** [ Visual Dialog] [ 1 ] ** (CVPR 2017 [ Spotlight] [ 4 ] ) </br >
8
7
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
9
- [ arxiv.org/abs/1611.08669] [ 1 ]
10
- [ CVPR 2017] [ 4 ] (Spotlight)
11
-
8
+ Arxiv Link: [ arxiv.org/abs/1611.08669] [ 1 ]
12
9
Live demo: http://visualchatbot.cloudcv.org
13
10
11
+ [ ![ Visual Chatbot] ( chat/static/images/screenshot.png )] ( http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s " Visual Chatbot ")
12
+
13
+ Introduction
14
+ ---------------
14
15
** Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’!
15
16
16
- [ ![ Visual Chatbot] ( chat/static/images/screenshot.png )] ( http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s " Visual Chatbot ")
17
+ What has changed since the last version?
18
+ ---------------------------------------------------
19
+ The model-building code is completely shifted to Pytorch, we have put in a much improved [ Bottom Up Top Down] [ 12 ] captioning model from [ Pythia] [ 10 ] and Mask-RCNN feature extractor from [ maskrcnn-benchmark] [ 13 ] . The Visdial model is borrowed from [ visdial-challenge-starter] [ 14 ] code.
17
20
18
- ## Installation Instructions
21
+ Please follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [ tag ] ( ) .
19
22
20
- ### Installing the Essential requirements
23
+ Setup and Dependencies
24
+ ------------------------------
25
+ Start with installing the Build Essentials , [ Redis Server] [ 5 ] and [ RabbiMQ Server] [ 6 ] .
26
+ ``` sh
27
+ sudo apt-get update
21
28
22
- ``` shell
29
+ # download and install build essentials
23
30
sudo apt-get install -y git python-pip python-dev
24
- sudo apt-get install -y python-dev
25
- sudo apt-get install -y autoconf automake libtool curl make g++ unzip
31
+ sudo apt-get install -y autoconf automake libtool
26
32
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
27
- sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
28
- ```
29
-
30
- ### Install Torch
31
-
32
- ``` shell
33
- git clone https://github.com/torch/distro.git ~ /torch --recursive
34
- cd ~ /torch; bash install-deps;
35
- ./install.sh
36
- source ~ /.bashrc
37
- ```
38
-
39
- ### Install PyTorch(Python Lua Wrapper)
40
-
41
- ``` shell
42
- git clone https://github.com/hughperkins/pytorch.git
43
- cd pytorch
44
- source ~ /torch/install/bin/torch-activate
45
- ./build.sh
46
- ```
33
+ sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
47
34
48
- ### Install RabbitMQ and Redis Server
49
-
50
- ``` shell
35
+ # download and install redis-server and rabbitmq-server
51
36
sudo apt-get install -y redis-server rabbitmq-server
52
37
sudo rabbitmq-plugins enable rabbitmq_management
53
38
sudo service rabbitmq-server restart
54
39
sudo service redis-server restart
55
40
```
56
41
57
- ### Lua dependencies
58
-
59
- ``` shell
60
- luarocks install loadcaffe
61
- ```
42
+ #### Environment Setup
62
43
63
- The below two dependencies are only required if you are going to use GPU
44
+ You can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [ downloads page ] [ 17 ] and proceed below.
64
45
65
- ``` shell
66
- luarocks install cudnn
67
- luarocks install cunn
68
- ```
69
46
70
- ### Cuda Installation
47
+ ``` sh
48
+ # clone and download submodules
49
+ git clone https://www.github.com/yashkant/visual-chatbot.git
50
+ git submodule update init --recursive
71
51
72
- Note: CUDA and cuDNN is only required if you are going to use GPU
52
+ # create and activate new environment
53
+ conda create -n vischat python=3.6.8
54
+ conda activate vischat
73
55
74
- Download and install CUDA and cuDNN from [ nvidia website] ( https://developer.nvidia.com/cuda-downloads )
75
-
76
- ### Install dependencies
77
-
78
- ``` shell
79
- git clone https://github.com/Cloud-CV/visual-chatbot.git
80
- cd visual-chatbot
81
- git submodule init && git submodule update
82
- sh models/download_models.sh
56
+ # install the requirements of chatbot and visdial-starter code
57
+ cd visual-chatbot/
83
58
pip install -r requirements.txt
84
59
```
85
60
86
- If you have not used nltk before, you will need to download a tokenization model.
61
+ #### Downloads
62
+ Download the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files.
63
+ ``` sh
64
+ sh viscap/download_models.sh
65
+ ```
87
66
88
- ``` shell
89
- python -m nltk.downloader punkt
67
+ #### Install Submodules
68
+ Install Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction.
69
+ ``` sh
70
+ # install fastText (dependency of pythia)
71
+ cd viscap/captioning/fastText
72
+ pip install -e .
73
+
74
+ # install pythia for using butd model
75
+ cd ../pythia/
76
+ sed -i ' /torch/d' requirements.txt
77
+ pip install -e .
78
+
79
+ # install maskrcnn-benchmark for feature extraction
80
+ cd ../vqa-maskrcnn-benchmark/
81
+ python setup.py build
82
+ python setup.py develop
83
+ cd ../../../
90
84
```
85
+ #### Cuda Installation
91
86
92
- Change lines 2-4 of ` neuraltalk2/misc/LanguageModel.lua ` to the following:
87
+ Note: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [ nvidia website ] [ 18 ] .
93
88
94
- ``` shell
95
- local utils = require ' neuraltalk2.misc.utils '
96
- local net_utils = require ' neuraltalk2.misc.net_utils '
97
- local LSTM = require ' neuraltalk2.misc.LSTM '
89
+ #### NLTK
90
+ We use ` PunktSentenceTokenizer ` from nltk, download it if you haven't already.
91
+ ``` sh
92
+ python -c " import nltk; nltk.download('punkt') "
98
93
```
99
94
100
- ### Create the database
101
95
102
- ``` shell
96
+ ## Let's run this now!
97
+ #### Setup the database
98
+ ```
99
+ # create the database
103
100
python manage.py makemigrations chat
104
101
python manage.py migrate
105
102
```
103
+ #### Run server and worker
104
+ Launch two separate terminals and run worker and server code.
105
+ ``` sh
106
+ # run rabbitmq worker on first terminal
107
+ # warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing
108
+ python worker_viscap.py
109
+
110
+ # run development server on second terminal
111
+ python manage.py runserver
112
+ ```
113
+ You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
106
114
107
- ### Running the RabbitMQ workers and Development Server
115
+ ## Issues
116
+ If you run into incompatibility issues, please take a look [ here] [ 7 ] and [ here] [ 8 ] .
108
117
109
- Open 3 different terminal sessions and run the following commands:
118
+ ## Model Checkpoint and Features Used
119
+ Performance on ` v1.0 test-std ` (trained on ` v1.0 ` train + val):
110
120
111
- ``` shell
112
- python worker.py
113
- python worker_captioning.py
114
- python manage.py runserver
115
- ```
121
+ Model | R@1 | R@5 | R@10 | MeanR | MRR | NDCG |
122
+ ------- | ------ | ------ | ------ | ------ | ------ | ------ |
123
+ [ lf-gen-mask-rcnn-x101-demo] [ 20 ] | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 |
116
124
117
- You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
125
+ Extracted features from ` VisDial v1.0 ` used to train the above model are here:
126
+
127
+ - [ features_mask_rcnn_x101_train.h5] [ 21 ] : Mask-RCNN features with 100 proposals per image train split.
128
+ - [ features_mask_rcnn_x101_val.h5] [ 22 ] : Mask-RCNN features with 100 proposals per image val split.
129
+ - [ features_mask_rcnn_x101_test.h5] [ 23 ] : Mask-RCNN features with 100 proposals per image test split.
130
+
131
+ * Note* : Above features have key ` image_id ` (from earlier versions) renamed as ` image_ids ` .
118
132
119
133
## Cite this work
120
134
@@ -131,20 +145,43 @@ If you find this code useful, consider citing our work:
131
145
```
132
146
133
147
## Contributors
134
-
148
+ * [ Yash Kant ] [ 19 ] ( [email protected] )
135
149
* [ Deshraj Yadav
] [ 2 ] (
[email protected] )
136
150
* [ Abhishek Das
] [ 3 ] (
[email protected] )
137
151
138
152
## License
139
153
140
154
BSD
141
155
142
- ## Credits
156
+ ## Credits and Acknowledgements
143
157
144
158
- Visual Chatbot Image: "[ Robot-clip-art-book-covers-feJCV3-clipart] ( https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png ) " by [ Wikimedia Commons] ( https://commons.wikimedia.org ) is licensed under [ CC BY-SA 4.0] ( https://creativecommons.org/licenses/by-sa/4.0/deed.en )
145
-
159
+ - The beam-search implementation was borrowed as it is from [ AllenNLP] ( 15 ) .
160
+ - The vqa-maskrcnn-benchmark code used was forked from @meetshah1995 's [ fork] ( 16 ) of the original repository.
161
+ - The VisDial model is borrowed from [ visdial-starter-challenge ] [ 14 ] .
162
+ - The BUTD captioning model comes from this awesome repository [ Pythia] [ 10 ] .
146
163
147
164
[ 1 ] : https://arxiv.org/abs/1611.08669
148
165
[ 2 ] : http://deshraj.github.io
149
166
[ 3 ] : https://abhishekdas.com
150
167
[ 4 ] : http://cvpr2017.thecvf.com/
168
+ [ 5 ] : https://redis.io/
169
+ [ 6 ] : https://www.rabbitmq.com/
170
+ [ 7 ] : https://github.com/unbit/uwsgi/issues/1770
171
+ [ 8 ] : https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer
172
+ [ 9 ] : https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark)
173
+ [ 10 ] : https://github.com/facebookresearch/pythia/
174
+ [ 11 ] : https://github.com/facebookresearch/fastText/
175
+ [ 12 ] : https://arxiv.org/abs/1707.07998
176
+ [ 13 ] : https://github.com/facebookresearch/maskrcnn-benchmark
177
+ [ 14 ] : https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/
178
+ [ 15 ] : https://www.github.com/allenai/allennlp
179
+ [ 16 ] : https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/
180
+ [ 17 ] : https://conda.io/docs/user-guide/install/download.html
181
+ [ 18 ] : https://developer.nvidia.com/cuda-downloads
182
+ [ 19 ] : https://github.com/yashkant
183
+ [ 20 ] : https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth
184
+ [ 21 ] : https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5
185
+ [ 22 ] : https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5
186
+ [ 23 ] : https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5
187
+
0 commit comments