This repository is for a tutorial on Topological Data Analysis (TDA) for the Midwest Big Data Summer School in 2021 virtually in Ames, Iowa. This tutorial covers persistent homology and mapper, two of the main tools used in TDA.
The slides for this tutorial can be found here.
The zip archive InteractiveJPDwB.zip contains three versions of the InteractiveJPDwB application for understanding the Vietoris-Rips construction of a point cloud. There are Windows, MacOSX, and Linux versions, depending on your operating system. The instructions for using the program can be found in the bottom panel.
For persistent homology, we use two implementations: scikit-tda
and giotto-tda
.
Both of these packages are available on pypi and everything
you need for the topological data analysis part of the tutorial can be
installed with
pip install scikit-tda, giotto-tda
The tutorial depends on other libraries like numpy
and matplotlib
which I assume you already
have installed.
Both scikit-tda
and giotto-tda
implement Vietoris-Rips persistent homology based on the
Ripser algorithm, a
very efficient C++ implementation of persistence.
If you don't want to install anything on your computer, you can go to
live.ripser.org and upload the data sets there
for easy persistence computations.
There are three persistent homology notebooks for users to work through:
- Introduction to persistent homology. A simple notebook with mostly synthetic data sets.
-
Differentiation using persistence landscapes. A notebook for distinguishing
$S^2$ from$S^3$ using one-dimensional homology, highlighting the geometric aspects of persistence. This relies on persistence landscapes, one of the first vectorization schemes introduced for persistence diagrams. - MNIST using persistent homology. The most advanced notebook, combining cubical persistence with various vectorization schemes to build a digit classifier for the famous MNIST data set.
We use KeplerMapper for our mapper implementation. Kepler Mapper is written in python, and is compatible with other machine learning packages, like scikit-learn.
There is one mapper notebook for users to work through: Introduction to mapper. An elementary notebook with basic data sets to get accustomed to choosing filter functions, cover parameters, etc.
There are a variety of excellent and much more thorough tutorials available online by experts in the field. Some of the data sets in this tutorial are either motivated by or come directly from the following:
-
Charleston-TDA-ML. A tutorial on Persistent Homology written by Henry Adams, Melissa McGuirl, and Yitzchak Solomon.
-
Peter Bubenik's TDA with R worksheet. A tutorial on using R to analyze data with Persistence Landscapes.
-
MAA-NCS18. Matt Zabka's Persistence Homology tutorial.
-
R-TDA tutorial. An R-TDA worksheet tutorial written by Jisu Kim.
-
Scikit-tda tutorials: Ripser.py tutorials, KeplerMapper tutorials, Persim tutorials.
-
giotto-tda tutorials. A list of tutorials and examples highlighting the functionality of
giotto-tda
.
This list is nowhere near complete, and there are lots of other great tutorials for learning TDA.