Given a video and a natural language query, our task requires a system to retrieve the most relevant moments in the video, and detect the highlightness of the clips in the video.
At project root, run
bash standalone_eval/eval_sample.sh
This command will use eval.py to evaluate the provided prediction file sample_val_preds.jsonl,
the output will be written into sample_val_preds_metrics.json
.
The content in this generated file should be similar if not the same as sample_val_preds_metrics_raw.json file.
The prediction file sample_val_preds.jsonl is in JSON Line format, each row of the files can be loaded as a single dict
in Python. Below is an example of a single line in the prediction file:
{
"qid": 2579,
"query": "A girl and her mother cooked while talking with each other on facetime.",
"vid": "NUsG9BgSes0_210.0_360.0",
"pred_relevant_windows": [
[0, 70, 0.9986],
[78, 146, 0.4138],
[0, 146, 0.0444],
...
],
"pred_saliency_scores": [-0.2452, -0.3779, -0.4746, ...]
}
entry | description |
---|---|
qid |
int , unique query id |
query |
str , natural language query, not used by the evaluation script |
vid |
str , unique video id |
pred_relevant_windows |
list(list) , moment retrieval predictions. Each sublist contains 3 elements, [start (seconds), end (seconds), score] |
pred_saliency_scores |
list(float) , highlight prediction scores. The higher the better. This list should contain a score for each of the 2-second clip in the videos, and is ordered. |
To test your model's performance on test
split,
please submit both val
and test
predictions to our
Codalab evaluation server.
The submission file should be a single .zip
file (no enclosing folder)
that contains the two prediction files
hl_val_submission.jsonl
and hl_test_submission.jsonl
, each of the *submission.jsonl
file
should be formatted as instructed above.