Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
-
Updated
Feb 3, 2023 - Jupyter Notebook
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
An easy implementation of Faster R-CNN (https://arxiv.org/pdf/1506.01497.pdf) in PyTorch.
Adds SPICE metric to coco-caption evaluation server codes
A Clone version from Original SegCaps source code with enhancements on MS COCO dataset.
Using LSTM or Transformer to solve Image Captioning in Pytorch
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
An easy implementation of FPN (https://arxiv.org/pdf/1612.03144.pdf) in PyTorch.
Real-time semantic image segmentation on mobile devices
We aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
Pytorch implementation of image captioning using transformer-based model.
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
Convert segmentation binary mask images to COCO JSON format.
The pytorch implementation on “Fine-Grained Image Captioning with Global-Local Discriminative Objective”
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
A demo for mapping class labels from ImageNet to COCO.
PyTorch implementation of paper: "Self-critical Sequence Training for Image Captioning"
Caption generation from images using topics as additional guiding inputs.
Object-Detection API using MSCOCO dataset & using customized dataset from tensorflow
MS COCO captions in Arabic
Image caption generation using GRU-based attention mechanism
Add a description, image, and links to the mscoco-dataset topic page so that developers can more easily learn about it.
To associate your repository with the mscoco-dataset topic, visit your repo's landing page and select "manage topics."