Dataset handling is very inefficient

The current way the dataset is loaded for training is super inefficient and loads the whole dataset all at once. As such, I should consider changing the dataset from being stored as a pickle file, as well as whether to use `flow_from_directory` or similar techniques.