Instead of computing all evaluation results at the same time, compute results batch by batch such that users can look at results early