We should create some of the first-line CI testing for our models to improve reproducibility, etc. Assigning a bunch of people. @lgray can take care of some of the baseline infrastructure and we can add scripts from there.