Privacy Preserving Framework to anonymize user trajectories contained in a population of users. It allows for researchers and engineers to plug-in any dataset & model into our current system.
Processing GPS location time series data with a Trajectory Generative Adversarial Network to generate synthetic data.
This is a TensorFlow 2 project and if you plan to train models on an NVIDIA GPU we recommend using conda to install the dependencies because each version of TensorFlow only works with a specific version of CUDA (see TensorFlow docs for tested configurations) and conda can install isolated CUDA versions in its environments to prevent conflicts.
conda env create -f environment.yml
conda activate
- Add two environment variables to a
.envfile in this (root) directory:xxx_INPUT_DIRwhich is the absolute directory path of the raw input dataset on your system, andxxx_INPUT_FILE, which is a file location where you want preprocessed data to be saved, i.e. a CSV file. Replace thexxxwith a 3-letter "nickname" for your dataset. - Create a new .py file in src/datasets/ for your dataset.
- Write a class that subclasses
Dataset(from src/datasets/base.py) and implements apreprocess()method that reads in the raw data and returns apandas.DataFrame. - Import your class into
src/datasets/init.py and add the
class name to the
DATASETSlist.
- Create a new .py file in src/models/ and write a
class that inherits from
TrajectoryModel(in src/models/base.py) and implements at leasttrain,predict,saveandrestoreabstract methods. If it's a supervised model (like MARC) then you'll also want to add anevaluatemethod to get metrics on the test set. - Import your model class into
src/models/init.py and add the class
name to the
MODELSlist.
Use the CLI script's train command:
$ python mobility_cli.py
Usage: mobility_cli.py [OPTIONS] COMMAND [ARGS]...
Command line interface for the mobility learning framework.
Options:
--help Show this message and exit.
Commands:
evaluate Use SAVED_MODEL to predict the labels of DATASET.
predict Use trained MODEL saved in SAVED_PATH to make predictions based...
train Train MODEL on DATASET stored in DATASET_PATH for EPOCHS.
$ python mobility_cli.py train --help
Usage: mobility_cli.py train [OPTIONS] [LSTMTrajGAN|MARC] [MDCLausanne|GeoLife
Beijing|FourSquareNYC|PrivamovLyon] EPOCHS
Train MODEL on DATASET stored in DATASET_PATH for EPOCHS.
Options:
--help Show this message and exit.
$ python mobility_cli.py train LSTMTrajGAN GeoLifeBeijing 200
Use the CLI script's predict command:
$ python mobility_cli.py predict --help
Usage: mobility_cli.py predict [OPTIONS] [LSTMTrajGAN|MARC] SAVED_PATH [MDCLau
sanne|GeoLifeBeijing|FourSquareNYC|PrivamovLyon
] OUTPUT_PATH
Use trained MODEL saved in SAVED_PATH to make predictions based on DATASET
and write to OUTPUT_PATH as CSV.
Options:
--help Show this message and exit.
$ python mobility_cli.py predict LSTMTrajGAN LSTMTrajGAN experiments/LSTMTrajGAN_GeoLifeBeijing/2021-07-24T23:55:37/saved_model/ outputs/LSTMTrajGAN_GeoLifeBeijing_predictions.csv
Use the CLI script's evaluate command:
$ python mobility_cli.py evaluate --help
Usage: mobility_cli.py evaluate [OPTIONS] [LSTMTrajGAN|MARC] SAVED_PATH [MDCLa
usanne|GeoLifeBeijing|FourSquareNYC|PrivamovLy
on]
Use SAVED_MODEL to predict the labels of DATASET.
Options:
--help Show this message and exit.
| Date | Note | Author |
|---|---|---|
| 3/31 | Created Repo; initalization & config.py | jeffmur |
| 4/1 | Project structure and importing fixes | alexkyllo |
| 4/5 | Optimize freqMatrix function | alexkyllo |
| 4/6 | Appended Ali's LSTM-AE, updated req.txt | jeffmur |
| 7/17 | Added LSTMTrajGAN and MARC models | alexkyllo |
| 7/24 | GeoLife and Privamov Ready for training | jeffmur |
| 7/25 | Add training instructions to README | alexkyllo |
- Port LSTM-TrajGAN to TensorFlow 2 so it can be run in the same environment
- Preprocessing code for MDC data so it can be fed into LSTM-TrajGAN
- Port MARC reidentifier model to TF2
- Preprocessing code for GeoLife data so it can be fed into LSTM-TrajGAN
- Preprocessing code for Privamov data so it can be fed into LSTM-TrajGAN
- Post-processing code to output LSTM-TrajGAN generated trajectories to CSV
- Train LSTM-TrajGAN on MDC Lausanne dataset
- Train LSTM-TrajGAN on Foursquare NYC dataset
- Train LSTM-TrajGAN on GeoLife Beijing dataset
- Train LSTM-TrajGAN on Privamov Lyon dataset
- Train MARC on MDC dataset
- Train MARC on FourSquare NYC dataset
- Train MARC on GeoLife dataset
- Train MARC on Privamov dataset
- Compare MARC performance on real vs. generated MDC trajectories for LSTM-TrajGAN
- Get outputs from Yuting's LSTM-AE model on MDC, FourSquare, Privamov and GeoLife datasets
- Compare MARC performance on real vs. generated trajectories for LSTM-AE
- Evaluate realism of generated trajectories using distribution and distance comparisons