Hello,
I am doing some analyses on the human judgments that you've released and I was wondering: is it possible to find out, for each debate, which models were used? Or is there a way to find out which Experiment (1-10) each transcript came from, so I could at least exclude the low-Elo transcripts which don't use some version of GPT-4?
Thank you!