Issues with groupby columns 

Soo, 

For prec_recall and Hitk we know have the case that the input groupby columns determines by which columns the similarity df is sorted. This has a important impact on your solution. If you for example sort by something that is not unique, ie not unique in the input df -  then you will get internal connections in the sub dataframe that you are grouping. 

Lets say you have for example a df with Sampels and different dosages. If you then have groupby_columns = Metadata_broad_sample, then you will sort into sub groups that have several connections within each other (all the different doses). And your precision will have the weird effects that @FloHu described in #62 for example. Similarly, hitk will have weird results because you are now looking at internal connections and not only the nearest neighbors of one sample. 


Either we keep it all this way and make users aware of this or we find some workaround here? Maybe the solution is to not allow anything other than unique groupby_cols ?  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with groupby columns #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with groupby columns #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions