-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Problem
When testing compound × dose combinations, the current BH FDR correction treats all tests as independent. For example, with 1000 compounds × 5 doses = 5000 tests, BH correction is very stringent.
But doses of the same compound are not independent - they test the same underlying biological hypothesis ("does compound X have a phenotype?"). This leads to over-correction.
Proposed Solution
Implement hierarchical FDR as an option:
Stage 1: Test at group level (e.g., compound)
- Use minimum p-value within group as the group p-value
- Apply BH at level q
- A compound passes if ANY dose is significant
Stage 2: Test within groups (only for groups that passed Stage 1)
- For each significant group, apply BH at level q to its members
- Report member-level results
Why min p-value instead of Simes?
For dose-response data, low doses are expected to be inactive. Simes' method penalizes compounds for having inactive low doses, which is biologically normal. Min p-value is more appropriate: a compound passes Stage 1 if ANY dose shows activity.
Benefits
- Provides dose-level (or other sub-group) inference
- Much less harsh correction than treating all tests as independent
- Users specify grouping structure via metadata columns
Example
- 1000 compounds × 5 doses = 5000 raw tests
- Stage 1: 1000 compound tests → 50 pass (any dose active)
- Stage 2: 50 × 5 = 250 dose tests, corrected in groups of 5
- Result: dose-level significance with appropriate correction
API Design
Add parameter to mean_average_precision():
def mean_average_precision(
ap_scores: pd.DataFrame,
sameby: List[str], # e.g., ['compound', 'dose']
hierarchical_by: Optional[List[str]] = None, # NEW: e.g., ['compound']
...
)When hierarchical_by is specified:
samebydefines the granularity of mAP calculation (e.g., per compound×dose)hierarchical_bydefines the grouping for Stage 1 correction (e.g., per compound)- Stage 2 correction happens within each group
Benchmark (LINCS data)
On LINCS data (4 plates, 58 compounds × 6 doses):
- Flat BH: 26 significant doses
- Hierarchical with min-p: 49 significant doses (88% power gain)
Additional Context
Related bug to fix: silent_thread_map in map.py doesn't handle leave kwarg, causing TypeError when progress_bar=False.