capage

A CNN-LSTM-based image captioning model trained on the Flickr30k dataset. To learn more about how this project works, check out the documentation.

videoplayback.mp4

Installation

clone the repository: https://github.com/sorohere/capage.git
download weights and dataset:
- run run.sh script, or
- download manually from flickr30k
install dependencies: pip install -r requirement.txt

Dataset

the dataset used for this project is Flickr30k, which consists of:

around 31,000 unique images.
each image is paired with 5 captions, resulting in approximately 151,000 image-caption pairs.

Dataset Structure:

i. images folder: contains all the images used for training and evaluation.
ii. captions file:

captions.txt: A text file mapping each image to its corresponding caption.
each line follows the format: image_name, caption

Training:

To start training the model: python scripts/train.py, the vocabulary and trained model will be saved in scripts/checkpoints/.

Inference:

Ensure you have a trained model and vocabulary saved. If not, train the model yourself(checkout the scripts), to generate captions for images: python inference.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
deploy		deploy
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
model_behaviour.ipynb		model_behaviour.ipynb
requirement.txt		requirement.txt
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

capage

Installation

Dataset

Dataset Structure:

Training:

Inference:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sorohere/image_captioning

Folders and files

Latest commit

History

Repository files navigation

capage

Installation

Dataset

Dataset Structure:

Training:

Inference:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages