- Image caption model base on Show and Tell: A Neural Image Caption Generator with some modifications.
- The dataset come from Microsoft COCO 2014 train and valid, and we do some redistribution.
- This model is trained for NTHU CS565600 image caption competition.
- Our model achieved 0.944 CIDEr-D score on single model, which is the 1st place of the Image Caption Kaggle Competition.
- We provide end to end scripts and pretrained weight for reproduction.
- This slides briefly describe the implementation
- If you meet any problem, feel free to contact ([email protected]).
Here are some required libraries.
- python >= 3.6
- cuda >= 10.0 (or base on your tensorflow version)
- please refer requirements.txt
cd data
sh download.sh
python split.py
We use the NASNet model pretrained by Keras to get the image features. This step may took over one hour.
python nasnet.py
cd ../script
python create_tfrecord.py
python train.py
python inference.py
CIDEr-D | |
---|---|
Single Model | 0.944 |
Ensemble Model | 0.955 |