This repository is used to re-create the plots and code for the PyData Global 2020 presentation.
This code base utilizes anaconda for environmental dependencies. You may obtain anaconda here.
Checkout the repository:
git clone https://github.com/matrix-profile-foundation/pydata2020.git pydata2020-mpf
Install dependencies:
cd pydata2020-mpf
conda env create -f environment.yaml
conda activate pydata2020-mpf
# for a jupyter kernel with the conda environment
python -m ipykernel install --user --name=pydata2020-mpf
The code is distributed using Jupyter notebooks. You may launch jupyter lab and view the notebooks with the following commands (assuming you are still in the local repository directory).
jupyter lab
It is suggested, not required, to review the notebooks in the following order:
- Dataset Overview
- Transform Dataset
- Computing Distance Matrix
- MPDist vs Euclidean
- MPDist vs DTW
- Hierarchical Clustering
- HDBScan Clustering