Replies: 1 comment
-
Dear krooonal, Thanks for pointing this out, I guess this may save other participants some thinking if they encounter such edge cases. Such non-transitive effects are known also for other rank-based systems and other benchmarking tools in optimization. One example are the very well-established performance profiles, see e.g. the paper at https://www.numerical.rl.ac.uk/people/j_scott/publications/2016/GouldScott.2016_TOMS.pdf . To explain: We opted for using ranks per instance in order to overcome the issue that the instances in one series may have a slightly different base difficulty, independently of using reoptimization. Hence comparing speedup directly from instance to instance would be a rather noisy measure. For the evaluation phase, should there be any doubts on deciding the best performing submission, we will not only look at the ranking of all submissions, but also inspect pairwise comparisons of the top submission in order to reach a fair final decision. Thanks again, |
Beta Was this translation helpful? Give feedback.
-
It turns out that the ranking system does not hold the transitivity property. In other words, If in one to one comparison, A > B , B> C then it is not implied that A > C in the way ranking is defined.
This is showing up in my benchmarks. The rank of a particular pair of runs switches depending on which other runs are present in the comparison. Which seems weird. (A is better than B, but if C is present, then B is better than A?). I can't decide whether my experiment for that run worked well or not!
Here is the smallest example I could find.
Instance(weight) | (rank of) A | B | C
1 (1) | 2 | 0 | 1
2 (1.1) | 1 | 2 | 0
3 (1.2) | 0 | 1 | 2
Total scores: A(3.1) B(3.4) C(3.4)
In this case, A is better than B, B is better than C, and C is better than A in one to one comparison (translate the ranks to 0,1). However, the last comparison switches in the presence of B (A becomes better than C) and the second last comparison becomes equality (B = C).
I believe that the runs need to be close for the parity to switch. So, the current ranking system will still produce fair results.
Beta Was this translation helpful? Give feedback.
All reactions