Skip to content

Commit 0abc3aa

Browse files
authored
Merge pull request #355 from monarch-initiative/release
Release
2 parents d5c3769 + c821079 commit 0abc3aa

File tree

30 files changed

+1311
-439
lines changed

30 files changed

+1311
-439
lines changed

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
# The short X.Y version.
6363
version = u'0.7'
6464
# The full version, including alpha/beta/rc tags.
65-
release = u'0.7.0'
65+
release = u'0.7.1'
6666

6767
# The language for content autogenerated by Sphinx. Refer to documentation
6868
# for a list of supported languages.

docs/tutorial.rst

Lines changed: 8 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -237,39 +237,20 @@ Testing multiple hypothesis on the same dataset increases the chance of receivin
237237
However, GPSEA simplifies the application of an appropriate multiple testing correction.
238238

239239
For general use, we recommend using a combination
240-
of a *Phenotype MTC filter* (:class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`) with a *multiple testing correction*.
241-
Phenotype MTC filter chooses the HPO terms to test according to several heuristics, which
240+
of a *phenotype MT filter* (:class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`) with a *multiple testing correction*.
241+
Phenotype MT filter chooses the HPO terms to test according to several heuristics, which
242242
reduce the multiple testing burden and focus the analysis
243-
on the most interesting terms (see :ref:`HPO MTC filter <hpo-mtc-filter-strategy>` for more info).
243+
on the most interesting terms (see :ref:`HPO MT filter <hpo-mtc-filter-strategy>` for more info).
244244
Then the multiple testing correction, such as Bonferroni or Benjamini-Hochberg,
245245
is used to control the family-wise error rate or the false discovery rate.
246246
See :ref:`mtc` for more information.
247247

248-
In this example, we will use a combination of the HPO MTC filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`)
249-
with Benjamini-Hochberg procedure (``mtc_correction='fdr_bh'``)
250-
with a false discovery control level at (``mtc_alpha=0.05``):
248+
>>> from gpsea.analysis.pcats import configure_hpo_term_analysis
249+
>>> analysis = configure_hpo_term_analysis(hpo)
251250

252-
>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
253-
>>> mtc_filter = HpoMtcFilter.default_filter(hpo)
254-
>>> mtc_correction = 'fdr_bh'
255-
>>> mtc_alpha = 0.05
256-
257-
Choosing the statistical procedure for assessment of association between genotype and phenotype
258-
groups is the last missing piece of the analysis. We will use Fisher Exact Test:
259-
260-
>>> from gpsea.analysis.pcats.stats import FisherExactTest
261-
>>> count_statistic = FisherExactTest()
262-
263-
and we finalize the analysis setup by putting all components together
264-
into :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:
265-
266-
>>> from gpsea.analysis.pcats import HpoTermAnalysis
267-
>>> analysis = HpoTermAnalysis(
268-
... count_statistic=count_statistic,
269-
... mtc_filter=mtc_filter,
270-
... mtc_correction=mtc_correction,
271-
... mtc_alpha=mtc_alpha,
272-
... )
251+
:func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` configures the analysis
252+
that uses HPO MTC filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) for selecting HPO terms of interest,
253+
Fisher Exact test for computing nominal p values, and Benjamini-Hochberg for multiple testing correction.
273254

274255
Now we can perform the analysis and investigate the results.
275256

docs/user-guide/analyses/phenotype-groups.rst

Lines changed: 53 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -167,59 +167,84 @@ The function finds 369 HPO terms that annotate at least one individual,
167167
including the *indirect* annotations whose presence is implied by the :ref:`true-path-rule`.
168168

169169

170-
Statistical test
171-
----------------
170+
Statistical analysis
171+
--------------------
172172

173173
We will use :ref:`fisher-exact-test` to test the association
174174
between genotype and phenotype groups, as described previously.
175175

176-
>>> from gpsea.analysis.pcats.stats import FisherExactTest
177-
>>> count_statistic = FisherExactTest()
176+
In the case of this cohort, we can test association between having a frameshift variant and one of 369 HPO terms.
177+
However, testing multiple hypotheses on the same dataset increases the risk of finding
178+
a significant association solely by chance.
179+
GPSEA uses a two-pronged strategy to reduce the number of tests and, therefore, mitigate this risk:
180+
a phenotype multiple testing (MT) filter and multiple testing correction (MTC).
178181

179-
FET will compute a p value for each genotype phenotype group.
182+
Phenotype MT filter selects a (sub)set of HPO terms for testing,
183+
for instance only the user-selected terms (see :class:`~gpsea.analysis.mtc_filter.SpecifyTermsStrategy`)
184+
or the terms selected by :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
180185

186+
Multiple testing correction then adjusts the nominal p values for the increased risk
187+
of false positive G/P associations.
188+
The available MTC procedures are listed in the :ref:`mtc-correction-procedures` section.
181189

182-
Multiple testing correction
183-
---------------------------
190+
We must pick one of these to perform genotype-phenotype analysis.
184191

185-
In the case of this cohort, we could test association between having a frameshift variant and one of 369 HPO terms.
186-
However, testing multiple hypotheses on the same dataset increases the risk of finding a significant association
187-
by chance.
188-
GPSEA uses a two-pronged strategy to mitigate this risk - a phenotype MTC filter and multiple testing correction.
189192

190-
.. note::
193+
Default analysis
194+
^^^^^^^^^^^^^^^^
195+
196+
We recommend using HPO MT filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) as a phenotype MT filter
197+
and Benjamini-Hochberg for multiple testing correction.
198+
The default analysis can be configured with :func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` convenience method.
199+
200+
>>> from gpsea.analysis.pcats import configure_hpo_term_analysis
201+
>>> analysis = configure_hpo_term_analysis(hpo)
191202

192-
See the :ref:`mtc` section for more info on multiple testing procedures.
193203

194-
Here we will use a combination of the HPO MTC filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`)
195-
with Benjamini-Hochberg procedure (``mtc_correction='fdr_bh'``)
196-
with a false discovery control level set to `0.05` (``mtc_alpha=0.05``):
204+
Custom analysis
205+
^^^^^^^^^^^^^^^
206+
207+
If the defaults do not work, we can configure the analysis manually.
208+
First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`):
197209

198210
>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
199-
>>> mtc_filter = HpoMtcFilter.default_filter(hpo, term_frequency_threshold=0.2)
200-
>>> mtc_correction = 'fdr_bh'
201-
>>> mtc_alpha = 0.05
211+
>>> mtc_filter = HpoMtcFilter.default_filter(hpo, term_frequency_threshold=.2)
212+
213+
.. note::
202214

215+
See the :ref:`mtc-filters` section for more info on the available MT filters.
203216

204-
Final analysis
205-
--------------
217+
then a statistical test (e.g. Fisher Exact test):
218+
219+
>>> from gpsea.analysis.pcats.stats import FisherExactTest
220+
>>> count_statistic = FisherExactTest()
221+
222+
.. note::
223+
224+
See the :mod:`gpsea.analysis.pcats.stats` module for the available multiple testing procedures
225+
(TL;DR, just Fisher Exact test at this time).
226+
227+
and we finalize the setup by choosing a MTC procedure
228+
(e.g. `fdr_bh` for Benjamini-Hochberg) along with the MTC alpha:
229+
230+
>>> mtc_correction = 'fdr_bh'
231+
>>> mtc_alpha = 0.05
206232

207-
We finalize the analysis setup by putting all components together
208-
into :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:
233+
The final :class:`~gpsea.analysis.pcats.HpoTermAnalysis` is created as:
209234

210235
>>> from gpsea.analysis.pcats import HpoTermAnalysis
211236
>>> analysis = HpoTermAnalysis(
212237
... count_statistic=count_statistic,
213238
... mtc_filter=mtc_filter,
214-
... mtc_correction=mtc_correction,
215-
... mtc_alpha=mtc_alpha,
239+
... mtc_correction='fdr_bh',
240+
... mtc_alpha=0.05,
216241
... )
217242

218243

219244
Analysis
220245
========
221246

222-
We can now execute the analysis:
247+
We can now test associations between the genotype groups and the HPO terms:
223248

224249
>>> result = analysis.compare_genotype_vs_phenotypes(
225250
... cohort=cohort,
@@ -232,8 +257,8 @@ We can now execute the analysis:
232257
24
233258

234259

235-
Thanks to phenotype MTC filter, we only tested 24 out of 369 terms.
236-
We can learn more by showing the MTC filter report:
260+
Thanks to phenotype MT filter, we only tested 24 out of 369 terms.
261+
We can learn more by showing the MT filter report:
237262

238263
>>> from gpsea.view import MtcStatsViewer
239264
>>> mtc_viewer = MtcStatsViewer()

docs/user-guide/mtc.rst

Lines changed: 9 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,9 @@ Therefore, unless we take into account the fact that multiple statistical tests
3636
it is likely that we will obtain one or more false-positive results.
3737

3838
GPSEA offers two approaches to mitigate this problem: multiple-testing correction (MTC) procedures
39-
and MTC filters to choose the terms to be tested.
39+
and MT filters to choose the terms to be tested.
4040

41+
.. _mtc-correction-procedures:
4142

4243
Multiple-testing correction procedures
4344
======================================
@@ -106,7 +107,7 @@ when creating an instance of :class:`~gpsea.analysis.pcats.HpoTermAnalysis`:
106107

107108
.. _mtc-filters:
108109

109-
MTC filters: Choosing which terms to test
110+
MT filters: Choosing which terms to test
110111
=========================================
111112

112113
We can reduce the overall MTC burden by choosing which terms to test.
@@ -117,28 +118,17 @@ may "survive" the multiple-testing correction.
117118

118119
In the context of GPSEA, we represent the concept of phenotype filtering
119120
by :class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`.
120-
The filter must be chosen before the :class:`~gpsea.analysis.pcats.MultiPhenotypeAnalysis`,
121-
such as :class:`~gpsea.analysis.pcats.HpoTermAnalysis`, is run:
122-
123-
>>> from gpsea.analysis.pcats import HpoTermAnalysis
124-
>>> analysis = HpoTermAnalysis() # doctest: +ELLIPSIS
125-
Traceback (most recent call last):
126-
...
127-
TypeError: HpoTermAnalysis.__init__() missing 2 required positional arguments: 'count_statistic' and 'mtc_filter'
128-
129-
Note the missing `mtc_filter` option.
130-
131-
We describe the three filtering strategies in the following sections.
121+
We provide three filtering strategies.
132122

133123

134124
.. _use-all-terms-strategy:
135125

136126
Test all terms
137127
--------------
138128

139-
The first MTC filtering strategy is the simplest - do not apply any filtering at all.
140-
This will result in testing all terms. We do not recommend this strategy,
141-
but it can be useful to disable MTC filtering.
129+
The first MT filtering strategy is the simplest - do not apply any filtering at all.
130+
This will result in testing all terms and we do not recommend this strategy,
131+
but it can be used to disable MT filtering.
142132

143133
The strategy is implemented in :class:`~gpsea.analysis.mtc_filter.UseAllTermsMtcFilter`.
144134

@@ -171,10 +161,10 @@ we pass an iterable (e.g. a tuple) with these two terms as an argument:
171161

172162
.. _hpo-mtc-filter-strategy:
173163

174-
HPO MTC filter strategy
164+
HPO MT filter strategy
175165
-----------------------
176166

177-
Last, the HPO MTC strategy involves making several domain judgments to take advantage of the HPO structure.
167+
The HPO MT strategy involves making several domain judgments and takes advantage of the HPO structure.
178168
The strategy needs access to HPO:
179169

180170
>>> import hpotk

docs/user-guide/predicates/devries.rst

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Statistical significance of a difference in the De Vries score between groups ca
1515
determined using the Mann-Whitney-U test.
1616

1717
We refer to `Feenstra et al. (2011) <https://pubmed.ncbi.nlm.nih.gov/21712853/>`_ for
18-
the original description of the adjusted De Vries score. Here we offer a version of the
18+
the original description of the adjusted De Vries score. Here we offer an adapted version of the
1919
score that leverages the structure of the Human Phenotype Ontology to assess the phenotype.
2020

2121

@@ -113,38 +113,54 @@ is 2 because the same individual cannot have both tall and short stature or both
113113
Facial dysmorphic features
114114
~~~~~~~~~~~~~~~~~~~~~~~~~~
115115

116-
This section assigns two points if two or more anomalies are identified in the following
117-
categories: hypertelorism, nasal anomalies and ear anomalies. Our implementation of this feature counts the total
118-
number of terms or descendents of the following HPO terms.
116+
This section assigns two points if two or more facial dysmorphisms are identified. In contrast to the list of anomalies described
117+
in the original 2011 publication of the DeVries score, we leverage the structure of the HPO to include many more HPO terms that
118+
denote various kinds of facial dysmorphism (e.g., `Abnormality of globe location <https://hpo.jax.org/browse/term/HP:0100886>`_ instead of just
119+
`Hypertelorism (HP:0000316) <https://hpo.jax.org/browse/term/HP:0000316>`_).
120+
121+
Our implementation of this feature counts the total number of terms or descendents of the following HPO terms. Up to one point is given
122+
for each of the categories.
119123

120124
+----------------------------------------------------------------------------------------------------------+-----------+
121125
| HPO term | Score |
122126
+==========================================================================================================+===========+
123-
| `Hypertelorism (HP:0000316) <https://hpo.jax.org/browse/term/HP:0000316>`_ | 1 |
127+
| `Abnormality of globe location (HP:0000316) <https://hpo.jax.org/browse/term/HP:0100886>`_ | 0 or 1 |
128+
+----------------------------------------------------------------------------------------------------------+-----------+
129+
| `Abnormal lip morphology (HP:0000159) <https://hpo.jax.org/browse/term/HP:0000159>`_ | 0 or 1 |
130+
+----------------------------------------------------------------------------------------------------------+-----------+
131+
| `Abnormal facial shape (HP:0001999) <https://hpo.jax.org/browse/term/HP:0001999>`_ | 0 or 1 |
132+
+----------------------------------------------------------------------------------------------------------+-----------+
133+
| `Abnormal midface morphology (HP:0000309) <https://hpo.jax.org/browse/term/HP:0000309>`_ | 0 or 1 |
124134
+----------------------------------------------------------------------------------------------------------+-----------+
125-
| `Abnormal external nose morphology (HP:0010938) <https://hpo.jax.org/browse/term/HP:0010938>`_ | 1 each |
135+
| `Abnormal forehead morphology (HP:0000290) <https://hpo.jax.org/browse/term/HP:0000290>`_ | 0 or 1 |
126136
+----------------------------------------------------------------------------------------------------------+-----------+
127-
| `Abnormal pinna morphology (HP:0000377) <https://hpo.jax.org/browse/term/HP:0000377>`_ | 1 each |
137+
| `Abnormal chin morphology (HP:0000306) <https://hpo.jax.org/browse/term/HP:0000306>`_ | 0 or 1 |
138+
+----------------------------------------------------------------------------------------------------------+-----------+
139+
| `Abnormal external nose morphology (HP:0010938) <https://hpo.jax.org/browse/term/HP:0010938>`_ | 0 or 1 |
140+
+----------------------------------------------------------------------------------------------------------+-----------+
141+
| `Abnormal pinna morphology (HP:0000377) <https://hpo.jax.org/browse/term/HP:0000377>`_ | 0 or 1 |
128142
+----------------------------------------------------------------------------------------------------------+-----------+
129143

130-
If two or more terms are found, the score is 2, otherwise a score of zero is assigned.
144+
If items from two or more categories are found, the score is 2, otherwise a score of zero is assigned.
131145

132146

133147
Non-facial dysmorphism and congenital abnormalities
134148
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
135-
One point is assigned for either the
136-
corresponding HPO terms or any of their descendents up to a maximum of two points.
149+
One point is assigned for either the corresponding HPO terms or any of their descendents up to a maximum of two points.
150+
A maximum of one point is assigned for each of the following categories.
137151

138152
+----------------------------------------------------------------------------------------------------------+-----------+
139153
| HPO term | Score |
140154
+==========================================================================================================+===========+
141-
| `Abnormal hand morphology (HP:0005922) <https://hpo.jax.org/browse/term/HP:0005922>`_ | 1 each |
155+
| `Abnormal hand morphology (HP:0005922) <https://hpo.jax.org/browse/term/HP:0005922>`_ | 0 or 1 |
142156
+----------------------------------------------------------------------------------------------------------+-----------+
143-
| `Abnormal heart morphology (HP:0001627) <https://hpo.jax.org/browse/term/HP:0001627>`_ | 1 each |
157+
| `Abnormal heart morphology (HP:0001627) <https://hpo.jax.org/browse/term/HP:0001627>`_ | 0 or 1 |
144158
+----------------------------------------------------------------------------------------------------------+-----------+
145-
| `Hypospadias (HP:0000047) <https://hpo.jax.org/browse/term/HP:0000047>`_ | 1 |
159+
| `Abnormal external genitalia morphology (HP:0000811) <https://hpo.jax.org/browse/term/HP:0000811>`_ | 0 or 1 |
146160
+----------------------------------------------------------------------------------------------------------+-----------+
147161

162+
The score for this section can thus be 0, 1, or 2.
163+
148164

149165
Final score
150166
~~~~~~~~~~~

src/gpsea/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
GPSEA is a library for analyzing genotype-phenotype correlations in cohorts of rare disease patients.
33
"""
44

5-
__version__ = "0.7.0"
5+
__version__ = "0.7.1"

src/gpsea/analysis/_base.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import os
44
import typing
55

6+
import numpy as np
67
import pandas as pd
78

89
from .predicate.phenotype import PhenotypePolyPredicate, P
@@ -205,6 +206,48 @@ def corrected_pvals(self) -> typing.Optional[typing.Sequence[float]]:
205206
The sequence includes a `NaN` value for each phenotype that was *not* tested.
206207
"""
207208
return self._corrected_pvals
209+
210+
def n_significant_for_alpha(
211+
self,
212+
alpha: float = .05,
213+
) -> typing.Optional[int]:
214+
"""
215+
Get the count of the corrected p values with the value being less than or equal to `alpha`.
216+
217+
:param alpha: a `float` with significance level.
218+
"""
219+
if self.corrected_pvals is None:
220+
return None
221+
else:
222+
return sum(p_val <= alpha for p_val in self.corrected_pvals)
223+
224+
def significant_phenotype_indices(
225+
self,
226+
alpha: float = .05,
227+
pval_kind: typing.Literal["corrected", "nominal"] = "corrected",
228+
) -> typing.Optional[typing.Sequence[int]]:
229+
"""
230+
Get the indices of phenotypes that attain significance for provided `alpha`.
231+
"""
232+
if pval_kind == "corrected":
233+
if self.corrected_pvals is None:
234+
vals = None
235+
else:
236+
vals = np.array(self.corrected_pvals)
237+
elif pval_kind == "nominal":
238+
vals = np.array(self.pvals)
239+
else:
240+
raise ValueError(f"Unsupported `pval_kind` value {pval_kind}")
241+
242+
if vals is None:
243+
return None
244+
245+
not_na = ~np.isnan(vals)
246+
significant = vals <= alpha
247+
selected = not_na & significant
248+
249+
return tuple(int(idx) for idx in np.argsort(vals) if selected[idx])
250+
208251

209252
@property
210253
def total_tests(self) -> int:

0 commit comments

Comments
 (0)