Include a new VLBERT model, combining image and text into the same vector space. https://github.com/jackroos/VL-BERT