Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 416 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 416 Bytes

Visual Question Answering (VQA)

Given the input including a picture and a question, the machine will answer the question according to the picture

  • trained a LSTM network to encode questions
  • trained a CNN to encode images
  • fused question and image features into one model and trained the model
  • tested on MS COCO dataset

(to be updated, codes created based on https://github.com/GT-Vision-Lab/VQA_LSTM_CNN)