I want to reproduce the evaluation pipeline for APPS, while it seems the `../data/apps_metric` invoked in the `test_apps.py` has been removed. How am I supposed to run the evaluation for APPS and CodeContests?