1. Add a new dropdown list for version_to_compare, next to version 2. Visualize output differences + evaluator differences 3. Conduct preference evaluation + visualize in the interface