Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can queries really be permuted independently? #69

Open
danr opened this issue Oct 29, 2024 · 1 comment
Open

Can queries really be permuted independently? #69

danr opened this issue Oct 29, 2024 · 1 comment

Comments

@danr
Copy link

danr commented Oct 29, 2024

Under the header Assigning significance to mAP scores you write

To generate mAP distribution under the null hypothesis, we repeatedly reshuffle the rank list and recalculate mAP.

You refer to the rank list. But there is one rank list for each query profile. Looking at your code it seems you treat these rank lists independently. But would it not be more correct to shuffle the profile labels (query or reference), then calculate mAP by recalculating each query profile's rank list? Why is your shortcut correct?

Thanks!

@alxndrkalinin
Copy link
Contributor

Sorry for not responding earlier, I just saw your comment.

Because AP calculation relies solely on ranks and not distance metric values, both these approaches are equivalent. Our null hypothesis is that M profiles in the query group come from the same distribution as N profiles in the reference group (i.e. produced by the same data-generating process). For each query profile the calculation procedure will produce a binary rank list of size N+(M-1). Since sizes of both groups are fixed and ranking is binary, the null distribution will always include all possible binary rankings. Thus, it only depends on two parameters: M-1 and N+(M-1) and has the exact size equal to the binomial coefficient N+(M-1) choose M-1. If we re-shuffle labels instead given a query and convert results into binary rank lists, we will recover the same null distribution.

Hope this illustration for M=2 and N=3 helps:
Null_shuffling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants