Skip to content

This repository uses the titel, caption, AI generated descriptions and other textual data associated with Flickr pictures and compares their semantic similarity with the 23 districs of Vienna, trying to associate pictures with their districs.

Notifications You must be signed in to change notification settings

simon-gross/flickr_and_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Matching Flickr Photos to City Districts in Vienna

Flickr is an easy-to-access data source for user-generated pictures. The pictures contain textual data, e.g. the title, a description or comments. Furthermore, using the transformer package and a huggingface model, it is possible to generate a short description of a picture.

Apart from that, districts in Vienna also have associated textual data. This could be for example their wikipedia entries or descriptions of points of interest in each district.

Using spacy and gensim Doc2Vec, we can use all of these textual clues to create an embedding space. Then for each pictures texts the district with the highest similarity associated with the picture. We can verify the experiment by using the geographic coordinates associated with the pictures.

A more complex apporach with using a random subset of the geotagged pictures as training data first is explored at the end.

See the Jupyter Notebook for a guide through the code

Data Sources

Viennese boundaries and the POI information is from data.gv.at.

Pictures are from Flickr, queried via API.

The model to generate image captions is from huggingface

About

This repository uses the titel, caption, AI generated descriptions and other textual data associated with Flickr pictures and compares their semantic similarity with the 23 districs of Vienna, trying to associate pictures with their districs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published