Skip to content

A TDA tutorial for the Midwest Big Data Summer School 2021

Notifications You must be signed in to change notification settings

catanzaromj/MBDS21_TDA

Repository files navigation

An Introduction to Topological Data Analysis

This repository is for a tutorial on Topological Data Analysis (TDA) for the Midwest Big Data Summer School in 2021 virtually in Ames, Iowa. This tutorial covers persistent homology and mapper, two of the main tools used in TDA.

The slides for this tutorial can be found here.

Vietoris-Rips persistence

The zip archive InteractiveJPDwB.zip contains three versions of the InteractiveJPDwB application for understanding the Vietoris-Rips construction of a point cloud. There are Windows, MacOSX, and Linux versions, depending on your operating system. The instructions for using the program can be found in the bottom panel.

Persistent Homology

For persistent homology, we use two implementations: scikit-tda and giotto-tda. Both of these packages are available on pypi and everything you need for the topological data analysis part of the tutorial can be installed with

   pip install scikit-tda, giotto-tda

The tutorial depends on other libraries like numpy and matplotlib which I assume you already have installed.

Both scikit-tda and giotto-tda implement Vietoris-Rips persistent homology based on the Ripser algorithm, a very efficient C++ implementation of persistence. If you don't want to install anything on your computer, you can go to live.ripser.org and upload the data sets there for easy persistence computations.

There are three persistent homology notebooks for users to work through:

  1. Introduction to persistent homology. A simple notebook with mostly synthetic data sets.
  2. Differentiation using persistence landscapes. A notebook for distinguishing $S^2$ from $S^3$ using one-dimensional homology, highlighting the geometric aspects of persistence. This relies on persistence landscapes, one of the first vectorization schemes introduced for persistence diagrams.
  3. MNIST using persistent homology. The most advanced notebook, combining cubical persistence with various vectorization schemes to build a digit classifier for the famous MNIST data set.

Mapper

We use KeplerMapper for our mapper implementation. Kepler Mapper is written in python, and is compatible with other machine learning packages, like scikit-learn.

There is one mapper notebook for users to work through: Introduction to mapper. An elementary notebook with basic data sets to get accustomed to choosing filter functions, cover parameters, etc.

Other tutorials

There are a variety of excellent and much more thorough tutorials available online by experts in the field. Some of the data sets in this tutorial are either motivated by or come directly from the following:

This list is nowhere near complete, and there are lots of other great tutorials for learning TDA.

About

A TDA tutorial for the Midwest Big Data Summer School 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published