Given the input including a picture and a question, the machine will answer the question according to the picture
- trained a LSTM network to encode questions
- trained a CNN to encode images
- fused question and image features into one model and trained the model
- tested on MS COCO dataset
(to be updated, codes created based on https://github.com/GT-Vision-Lab/VQA_LSTM_CNN)