Need a test dataset to - test if everything works correctly - even with limited memory resources? - using upstream models (for now) - measure/improve runtime