Add permutation method to calculate thresholds #74
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add permutation method to calculate thresholds
In this PR, we use a permutation method to randomly shuffle the whole rows (single-cells) and compute EMD per feature to get a single median value to represent the threshold where there is no change. Since EMD is not signed, we will add sign to this value in the figures to create a range.
We could not do the traditional shuffling independently per feature because it drastically changed the distributions per population, leading to large EMD values. When we shuffled whole rows, we can keep the same distribution which is what EMD is most sensitive to. I can explain this more in person if this doesn't make sense currently.