Predicting Travel Patterns Using Flickr

Using the 100 Million Photos and Videos database from Flickr to predict travel patterns within the United States and Central America.

The Hypothesis:

People typically travel to take photographs, or go to a specific place to take photographs. Even if it is their backyard, it is a place that has meaning and visual attraction. I am interested in looking at photography as a predictor of ideal locations to travel to. Where do people like to take photographs? Where will people like to take photographs?

Where will people travel?

The Preprocessing/Cleaning/Manipulation

The Flickr database consists of the following:

Photo/video ID
User NSID, User nickname
Date taken
Date uploaded
Capture device
Title, Description
User tags (comma-separated), Machine tags (comma-separated)
Longitude, Latitude
Accuracy
Photo/video page URL, Photo/video download URL
License name, License URL
Photo/video server identifier, Photo/video farm identifier
Photo/video secret, Photo/video secret original
Photo/video extension original
Photos/video marker (0 = photo, 1 = video)

Cleaning consisted of the following steps:

Taking out any cameras with "scan" in the name
Binning the rest of the camera brands, putting any that occur less than 1% of the time into a category "Other"

Visual Explorations

Through explorations of the camera brands apparent in the dataset, it is clear that there is a growth of Canon cameras over time, although the introduction of the Apple iPhone in 2007 quickly brings Apple into the ring for contention.

Clustering Optimization and Analysis

This analysis focused on the United States and Central America, and K-Means Clustering was used to break up the area into regions. To develop the optimal number of clusters, a silhouette score was assigned to a range of clusters. Using the scores as a guideline, the final number of clusters selected was 15.

Linear Regression

The points were grouped into each cluster, and used that to create the set of time series below, sorted by region. On average, the R-squared values were 86.2%, with a root mean square error of 11.9%, using a time-slice of five years to predict each sixth year.

Far West	West	Central	East
Alaska	Pacific Northwest	Northern Mountains	Northeast
Western Canada	California	Rocky Mountains	Mid-Atlantic
Hawaii	Southwest	Great Lakes	Southeast

                  |[Central America]	  |[South]               |[Caribbean]

What Will Happen in 2019?

Based on the analysis, the Pacific Northwest will be the most popular place, holding its status from 2000 onward. The least popular locations will be Hawaii and the South. There will be a growing trend in visits to Central America, and to California.

Pacific Northwest

Central America

California

Hawaii

Next Steps

This analysis has been based on a simple K-Means clustering, with the number of clusters fine tuned. It also has been sliced into a simple year by year time series, and analyzed using linear regression.

More Diversified Data

Using a database of only Flickr photos introduces biases to the data and the prediction. For example, the relative popularity of Flickr has evolved and peaked around 2010-2011, and has noticeably declined. The rise of various photo-sharing services such as Instagram, Twitter, Facebook, etc. have affected the total photos uploaded to Flickr.

To improve the prediction, the information from these sources would need to be added and adjusted. There will continue to be biases based on the demographics of each user base, and how the services are used.

More Models

It would be interesting to find a method of applying K-Medians to the area, to find the more dense locations.

The model used above to predict on the number of photos is a linear regression model from statsmodels. I also used support vector regression and linear support vector regression models to check. They were less stable in the face of the limited data, and produced less accurate forecasts.

Smaller Time Slices

To gain granularity, it would be prudent to block out the pictures by month, and gain more noise but also more information to define the forecast accurately.

Until then, enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
code		code
images		images
presentation		presentation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Travel Patterns Using Flickr

The Hypothesis:

Where will people travel?

The Preprocessing/Cleaning/Manipulation

Visual Explorations

Clustering Optimization and Analysis

Linear Regression

What Will Happen in 2019?

Next Steps

More Diversified Data

More Models

Smaller Time Slices

About

Uh oh!

Releases

Packages

Languages

mm-wang/flickrtravel

Folders and files

Latest commit

History

Repository files navigation

Predicting Travel Patterns Using Flickr

The Hypothesis:

Where will people travel?

The Preprocessing/Cleaning/Manipulation

Visual Explorations

Clustering Optimization and Analysis

Linear Regression

What Will Happen in 2019?

Next Steps

More Diversified Data

More Models

Smaller Time Slices

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages