Skip to content

Add hierarchical FDR correction for grouped hypotheses #115

@shntnu

Description

@shntnu

Problem

When testing compound × dose combinations, the current BH FDR correction treats all tests as independent. For example, with 1000 compounds × 5 doses = 5000 tests, BH correction is very stringent.

But doses of the same compound are not independent - they test the same underlying biological hypothesis ("does compound X have a phenotype?"). This leads to over-correction.

Proposed Solution

Implement hierarchical FDR as an option:

Stage 1: Test at group level (e.g., compound)
  - Use minimum p-value within group as the group p-value
  - Apply BH at level q
  - A compound passes if ANY dose is significant

Stage 2: Test within groups (only for groups that passed Stage 1)
  - For each significant group, apply BH at level q to its members
  - Report member-level results

Why min p-value instead of Simes?

For dose-response data, low doses are expected to be inactive. Simes' method penalizes compounds for having inactive low doses, which is biologically normal. Min p-value is more appropriate: a compound passes Stage 1 if ANY dose shows activity.

Benefits

  • Provides dose-level (or other sub-group) inference
  • Much less harsh correction than treating all tests as independent
  • Users specify grouping structure via metadata columns

Example

  • 1000 compounds × 5 doses = 5000 raw tests
  • Stage 1: 1000 compound tests → 50 pass (any dose active)
  • Stage 2: 50 × 5 = 250 dose tests, corrected in groups of 5
  • Result: dose-level significance with appropriate correction

API Design

Add parameter to mean_average_precision():

def mean_average_precision(
    ap_scores: pd.DataFrame,
    sameby: List[str],           # e.g., ['compound', 'dose']
    hierarchical_by: Optional[List[str]] = None,  # NEW: e.g., ['compound']
    ...
)

When hierarchical_by is specified:

  1. sameby defines the granularity of mAP calculation (e.g., per compound×dose)
  2. hierarchical_by defines the grouping for Stage 1 correction (e.g., per compound)
  3. Stage 2 correction happens within each group

Benchmark (LINCS data)

On LINCS data (4 plates, 58 compounds × 6 doses):

  • Flat BH: 26 significant doses
  • Hierarchical with min-p: 49 significant doses (88% power gain)

Additional Context

Related bug to fix: silent_thread_map in map.py doesn't handle leave kwarg, causing TypeError when progress_bar=False.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions