The current way the dataset is loaded for training is super inefficient and loads the whole dataset all at once. As such, I should consider changing the dataset from being stored as a pickle file, as well as whether to use flow_from_directory or similar techniques.