Automatic-image-captioning-using-recurrent-neural-network

From the internet, news stories, document diagrams, and commercials that we see every day, we gather a great number of pictures. It is up to the viewer to make sense of the visuals in these sources. Despite the lack of a description, people can make sense of most visuals without them. However, if people want automated picture captions, robots must be able to decipher some type of image caption. Image captioning is a major AI research field that deals with the interpretation of images and the description of those images in a foreign language. Understanding an image involves more than just finding and identifying items; it also includes figuring out the scene, the location, the attributes of the objects, and how they interact. Both syntactic and semantic knowledge are necessary to produce well-formed sentences. For the project, I have implemented an image captioning model using spatial adaptive attention, which is focused on refining the image features and helps the model understand the semantics of the captions. The model was trained on the flicker 8k dataset and had BLEU1 scores of.81 and 0.73 for blue2 when tested on the testing set and 0.8145 and 0.775 on the training set.

Aim of the project

It is possible to automatically generate a caption for a picture via the use of image captioning. It's becoming more and more popular as a new field of study. So that we may fulfil the aim of image captioning, semantic information from pictures must be gathered and articulated in natural languages. Image captioning is a difficult job since it connects the CV and NLP fields. The solution to this issue has been presented in a number of ways. I this project I am going to use GRU with spatial adaptive attention mechanism.

Research question

Does using attention mechanism using GRU improve performance of automatic image captioning?

working of the project

For the project I have used flicker 8k dataset. The flicker 8k dataset consists of 8092 images with 4 to 5 captions for each caption. The dataset is almost of 2GB and contains images and its captions. We can download the dataset from Kaggle:
https://www.kaggle.com/datasets/ming666/flicker8k-dataset

Proposed network:

Libraries used:

Project Design

Results:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
README.md		README.md
final code.py		final code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-image-captioning-using-recurrent-neural-network

Aim of the project

Research question

working of the project

Proposed network:

Libraries used:

Project Design

About

Releases

Packages

Languages

poojithpoosa/Automatic-image-captioning-using-recurrent-neural-network-GRU-

Folders and files

Latest commit

History

Repository files navigation

Automatic-image-captioning-using-recurrent-neural-network

Aim of the project

Research question

working of the project

Proposed network:

Libraries used:

Project Design

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages