Skip to content

Commit 74d5c14

Browse files
authored
Merge pull request #62 from sbslee/0.16.0-dev
0.16.0 dev
2 parents 0c8c33a + 01a5d71 commit 74d5c14

19 files changed

+938
-382
lines changed

CHANGELOG.rst

+35-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,26 @@
11
Changelog
22
*********
33

4+
0.16.0 (2022-06-08)
5+
-------------------
6+
7+
* Add new optional argument ``--comparison-table`` to :command:`train-cnv-caller` and :command:`test-cnv-caller` commands.
8+
* Update :meth:`sdk.utils.add_cn_samples` and :meth:`sdk.utils.simulate_copy_number` methods to check input files more rigorously.
9+
* Update :meth:`api.utils.test_cnv_caller` and :meth:`api.utils.train_cnv_caller` methods to accept the latest format of SampleTable[CNVCalls] as input.
10+
* Update plotting methods to optionally return a list of :class:`matplotlib.figure.Figure` objects for API users (e.g. Jupyter Notebook): :meth:`api.plot.plot_bam_copy_number`, :meth:`api.plot.plot_bam_read_depth`, :meth:`api.plot.plot_cn_af`, :meth:`api.plot.plot_vcf_allele_fraction`, :meth:`api.plot.plot_vcf_read_depth`.
11+
* :issue:`61`: Fix bug in commands :command:`compute-control-statistics`, :command:`compute-target-depth`, and :command:`prepare-depth-of-coverage` when a BED file is provided by user.
12+
* Improve CNV caller for CYP2A6, CYP2B6, CYP2D6, CYP2E1, GSTM1, SLC22A2, SULT1A1, UGT1A4, UGT2B15, UGT2B17.
13+
* Add new CNV call for CYP2A6: ``Unknown1``, ``Hybrid7``, ``Tandem2``.
14+
* Add new CNV calls for CYP2B6: ``Tandem1``, ``PartialDup1``, ``PartialDup2``, ``ParalogWholeDel1``.
15+
* Add new CNV call for CYP2D6: ``WholeDel1+Tandem3``. Also, remove ``PseudogeneDownstreamDel``.
16+
* Add new CNV calls for CYP2E1: ``WholeDel1`` and ``WholeDup1+PartialDup1``.
17+
* Add new CNV call for SLC22A2: ``NoncodingDel1Hom``.
18+
* Add new CNV call for SULT1A1: ``Unknown2``, ``Unknown3``, ``Unknown4``.
19+
* Add new CNV call for UGT1A4: ``NoncodingDel1Hom``.
20+
* Add new CNV call for UGT2B15: ``PartialDup2``.
21+
* Add new CNV call for UGT2B17: ``PartialDel2``. Also, define a new star allele ``*S3`` for ``PartialDel3``.
22+
* :issue:`59`: Update CNV labels.
23+
424
0.15.0 (2022-05-03)
525
-------------------
626

@@ -63,23 +83,28 @@ Changelog
6383
0.12.0 (2022-01-29)
6484
-------------------
6585

66-
* Add CNV caller for G6PD (mostly for sex determination since it's located on X chromosome).
67-
* Improve CNV caller for CYP2A6, CYP2B6, CYP2D6, CYP2E1, GSTM1, SULT1A1, UGT2B15, and UGT2B17.
6886
* Update :command:`run-ngs-pipeline` command to allow users to provide a custom CNV caller.
6987
* Update :meth:`api.core.predict_phenotype` method to not raise an error when a given star allele does not exist in the allele table. From now on, the method will output a warning about it but still produce an ``Indeterminate`` call.
7088
* Fix minor bug with ``--samples`` argument in commands :command:`plot-bam-copy-number`, :command:`plot-bam-read-depth`, :command:`plot-vcf-allele-fraction`, and :command:`plot-vcf-read-depth`.
7189
* Update :meth:`sdk.utils.add_cn_samples` method to accept a list of samples in addition to a file.
7290
* Add new argument ``--fontsize`` to :command:`plot-bam-read-depth` command.
7391
* Fix minor bug in :command:`plot-bam-read-depth` command.
7492
* Moved 1KGP reference haplotype panels and CNV callers to the ``pypgx-bundle`` `repository <https://github.com/sbslee/pypgx-bundle>`__ (only those files were moved; other files such as ``allele-table.csv`` and ``variant-table.csv`` are intact). From now on, the user needs to clone the ``pypgx-bundle`` repository with matching PyPGx version to their home directory in order for PyPGx to correctly access the moved files. This is undoubtedly annoying, but absolutely necessary for portability reasons because PyPGx has been growing exponentially in file size due to the increasing number of genes supported and their CNV complexity, to the point where it now exceeds upload size limit for PyPI (100 Mb). After removal of those files, the size of PyPGx has reduced from >100 Mb to <1 Mb.
93+
* Add CNV caller for G6PD (mostly for sex determination since it's located on X chromosome).
94+
* Improve CNV caller for CYP2A6, CYP2B6, CYP2D6, CYP2E1, GSTM1, SULT1A1, UGT2B15, and UGT2B17.
95+
* Add new CNV calls for CYP2A6: ``Duplication2``, ``Duplication3``, ``Deletion2Het``, ``Deletion3Het``, ``PseudogeneDuplication``, ``Hybrid2``, ``Hybrid3``. Additionally, some CNV calls have been renamed: ``Hybrid`` → ``Hybrid1``; ``Duplication`` → ``Duplication1``; ``DeletionHet`` → ``Deletion1Het``; ``DeletionHom`` → ``Deletion1Hom``.
96+
* Add a new CNV call for CYP2B6: ``Duplication``.
97+
* Add new CNV calls for CYP2D6: ``Unknown1``, ``Tandem1B``, ``Multiplication``. Additionally, some CNV calls have been renamed: ``Tandem1`` → ``Tandem1A``; ``DeletionHet,Tandem1`` → ``DeletionHet,Tandem1A``; ``Duplication,Tandem1`` → ``Duplication,Tandem1A``.
98+
* Add a new CNV call for CYP2E1: ``Duplication2``. Additionally, a CNV call have been renamed: ``Duplication`` → ``Duplication1``.
99+
* Add new CNV calls for GSTM1: ``UpstreamDeletionHet`` and ``DeletionHet,UpstreamDeletionHet``.
100+
* Add a new CNV call for UGT2B15: ``PartialDeletion2``. Additionally, a CNV call have been renamed: ``PartialDeletion`` → ``PartialDeletion1``.
101+
* Add a new CNV call for UGT2B17: ``PartialDeletionHet``.
75102

76103
0.11.0 (2022-01-01)
77104
-------------------
78105

79-
* Add CNV caller for CYP4F2 and SULT1A1.
80106
* Fix minor bug in :command:`compute-copy-number` command.
81107
* Update :command:`plot-cn-af` command to check input files more rigorously.
82-
* Improve CNV caller for CYP2A6, CYP2D6, and SLC22A2.
83108
* Add new method :meth:`sdk.utils.add_cn_samples`.
84109
* Update :command:`compare-genotypes` command to output CNV comparisonw results as well.
85110
* Update :command:`estimate-phase-beagle` command. From now on, the 'chr' prefix in contig names (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to match the reference VCF’s contig names.
@@ -89,6 +114,9 @@ Changelog
89114
* Change 1KGP reference haplotype panels for GRCh38. Previously, PyPGx was using the panels from `Lowy-Gallego et al., 2019 <https://wellcomeopenresearch.org/articles/4-50>`__ where the authors had aligned sequence reads against the full GRCh38 reference, including ALT contigs, decoy, and EBV/IMGT/HLA sequences. This resulted in poor phasing/imputation performance for highly polymorphic PGx genes (e.g. CYP2D6) presumably because the panels were missing haplotype information for lots of SNVs/indels as sequence reads with those variants were mapped to ALT contigs; however, the panels were still the best option at the time (definitely better than lifting over GRCh37 panels). Fortunately, `Byrska-Bishop et al., 2021 <https://www.biorxiv.org/content/10.1101/2021.02.06.430068v2>`__ from New York Genome Center has recently published a new set of GRCh38 panels which apparently has less of this problem despite still having sequence reads aligned in the presence of ALT contigs, etc. When empirically tested, these panels showed a significant increase in phasing/imputation performance. Therefore, from now on, PyPGx will use these panels for GRCh38 data.
90115
* Update GRCh38 variant information for following alleles: CYP2D6\*35, CYP2D6\*45, CYP2D6\*46.
91116
* Update gene region for SLC22A2 to match GRCh37 and GRCh38.
117+
* Add CNV caller for CYP4F2 and SULT1A1.
118+
* Improve CNV caller for CYP2A6, CYP2D6, and SLC22A2.
119+
* Add a new CNV call for CYP2D6: ``Tandem3``.
92120

93121
0.10.1 (2021-12-20)
94122
-------------------
@@ -112,8 +140,10 @@ Changelog
112140
* Rename ``--samples`` argument from :command:`run-ngs-pipeline` command to ``--samples-without-sv``.
113141
* Update :command:`run-ngs-pipeline` and :command:`run-chip-pipeline` commands to be able to subset/exclude specified samples.
114142
* Remove ``--fn`` argument from :command:`filter-samples` command.
115-
* Update CNV caller for CYP2D6, GSTM1, and UGT1A4.
116143
* Update :meth:`api.plot.plot_cn_af` method to accept both VcfFrame[Imported] and VcfFrame[Consolidated].
144+
* Improve CNV caller for CYP2D6, GSTM1, and UGT1A4.
145+
* Add a new CNV call for CYP2D6: ``Tandem2C``, ``DeletionHom``.
146+
* Add a new CNV call for UGT1A4: ``Intron1DeletionB``. Additionally, a CNV call have been renamed: ``Intron1Deletion`` → ``Intron1DeletionA``.
117147

118148
0.9.0 (2021-12-05)
119149
------------------

README.rst

+31-8
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,34 @@ README
2626
Introduction
2727
============
2828

29-
The main purpose of the PyPGx package, which is completely free and open
30-
source, is to provide a unified platform for pharmacogenomics (PGx) research.
29+
The main purpose of the PyPGx package is to provide a unified platform for
30+
pharmacogenomics (PGx) research. PyPGx is and always will be completely free
31+
and open source.
3132

3233
The package is written in Python, and supports both command line interface
3334
(CLI) and application programming interface (API) whose documentations are
3435
available at the `Read the Docs <https://pypgx.readthedocs.io/en/latest/>`_.
3536

36-
PyPGx can be used to predict PGx genotypes and phenotypes using various
37-
genomic data, including data from next-generation sequencing (NGS), single
38-
nucleotide polymorphism (SNP) array, and long-read sequencing. Importantly,
39-
PyPGx is compatible with both of the Genome Reference Consortium Human (GRCh)
40-
builds, GRCh37 (hg19) and GRCh38 (hg38).
37+
Quick links:
38+
39+
- `README <https://pypgx.readthedocs.io/en/latest/readme.html>`__
40+
- `Genes <https://pypgx.readthedocs.io/en/latest/genes.html>`__
41+
- `Glossary <https://pypgx.readthedocs.io/en/latest/glossary.html>`__
42+
- `Tutorials <https://pypgx.readthedocs.io/en/latest/tutorials.html>`__
43+
- `CLI <https://pypgx.readthedocs.io/en/latest/cli.html>`__
44+
- `API <https://pypgx.readthedocs.io/en/latest/api.html>`__
45+
- `SDK <https://pypgx.readthedocs.io/en/latest/sdk.html>`__
46+
- `FAQ <https://pypgx.readthedocs.io/en/latest/faq.html>`__
47+
- `Changelog <https://pypgx.readthedocs.io/en/latest/changelog.html>`__
48+
49+
PyPGx can predict PGx genotypes (e.g. ``*4/*5``) and phenotypes (e.g.
50+
``Poor Metabolizer``) using various genomic data, including data from
51+
next-generation sequencing (NGS), single nucleotide polymorphism (SNP) array,
52+
and long-read sequencing. Importantly, for NGS data the package can detect
53+
`structural variation (SV) <https://pypgx.readthedocs.io/en/latest/
54+
glossary.html#structural-variation-sv>`__ using a machine learning-based
55+
approach. Finally, note that PyPGx is compatible with both of the Genome
56+
Reference Consortium Human (GRCh) builds, GRCh37 (hg19) and GRCh38 (hg38).
4157

4258
There are currently 59 pharmacogenes in PyPGx:
4359

@@ -161,6 +177,10 @@ you can access a development branch with the ``git checkout`` command. When
161177
you do this, please make sure your environment already has all the
162178
dependencies installed.
163179

180+
.. warning::
181+
You're not done yet! Keep scrolling down to obtain the resource bundle
182+
for PyPGx, which is essential for running the package.
183+
164184
Resource bundle
165185
===============
166186

@@ -200,7 +220,10 @@ learn.org/stable/modules/generated/sklearn.svm.SVC.html>`__-based multiclass
200220
classifier using the `one-vs-rest strategy <https://scikit-learn.org/stable
201221
/modules/generated/sklearn.multiclass.OneVsRestClassifier.html>`__ for each
202222
gene for each GRCh build. Each classifier is trained using copy number
203-
profiles of real NGS samples as well as simulated ones.
223+
profiles of real NGS samples as well as simulated ones, including those from
224+
`1KGP <https://pypgx.readthedocs.io/en/latest/glossary.html#genomes-project-
225+
1kgp>`__ and `GeT-RM <https://pypgx.readthedocs.io/en/latest/
226+
glossary.html#genetic-testing-reference-materials-coordination-program-get-rm>`__.
204227

205228
You can plot copy number profile and allele fraction profile with PyPGx to
206229
visually inspect SV calls. Below are CYP2D6 examples:

docs/cli.rst

+18-5
Original file line numberDiff line numberDiff line change
@@ -527,7 +527,8 @@ plot-bam-copy-number
527527
Optional arguments:
528528
-h, --help Show this help message and exit.
529529
--fitted Show the fitted line as well.
530-
--path PATH Create plots in this directory.
530+
--path PATH Create plots in this directory (default: current
531+
directory).
531532
--samples TEXT [TEXT ...]
532533
Specify which samples should be included for analysis
533534
by providing a text file (.txt, .tsv, .csv, or .list)
@@ -556,7 +557,8 @@ plot-bam-read-depth
556557
557558
Optional arguments:
558559
-h, --help Show this help message and exit.
559-
--path PATH Create plots in this directory.
560+
--path PATH Create plots in this directory (default: current
561+
directory).
560562
--samples TEXT [TEXT ...]
561563
Specify which samples should be included for analysis
562564
by providing a text file (.txt, .tsv, .csv, or .list)
@@ -586,7 +588,8 @@ plot-cn-af
586588
587589
Optional arguments:
588590
-h, --help Show this help message and exit.
589-
--path PATH Create plots in this directory.
591+
--path PATH Create plots in this directory (default: current
592+
directory).
590593
--samples TEXT [TEXT ...]
591594
Specify which samples should be included for analysis
592595
by providing a text file (.txt, .tsv, .csv, or .list)
@@ -615,7 +618,8 @@ plot-vcf-allele-fraction
615618
616619
Optional arguments:
617620
-h, --help Show this help message and exit.
618-
--path PATH Create plots in this directory.
621+
--path PATH Create plots in this directory (default: current
622+
directory).
619623
--samples TEXT [TEXT ...]
620624
Specify which samples should be included for analysis
621625
by providing a text file (.txt, .tsv, .csv, or .list)
@@ -644,7 +648,8 @@ plot-vcf-read-depth
644648
-h, --help Show this help message and exit.
645649
--assembly TEXT Reference genome assembly (default: 'GRCh37')
646650
(choices: 'GRCh37', 'GRCh38').
647-
--path PATH Create plots in this directory.
651+
--path PATH Create plots in this directory (default: current
652+
directory).
648653
--samples TEXT [TEXT ...]
649654
Specify which samples should be included for analysis
650655
by providing a text file (.txt, .tsv, .csv, or .list)
@@ -980,6 +985,7 @@ test-cnv-caller
980985
981986
$ pypgx test-cnv-caller -h
982987
usage: pypgx test-cnv-caller [-h] [--confusion-matrix PATH]
988+
[--comparison-table PATH]
983989
cnv-caller copy-number cnv-calls
984990
985991
Test CNV caller for target gene.
@@ -997,6 +1003,9 @@ test-cnv-caller
9971003
Write the confusion matrix as a CSV file where rows
9981004
indicate actual class and columns indicate prediction
9991005
class.
1006+
--comparison-table PATH
1007+
Write a CSV file comparing actual vs. predicted CNV
1008+
calls for each sample.
10001009
10011010
train-cnv-caller
10021011
================
@@ -1005,6 +1014,7 @@ train-cnv-caller
10051014
10061015
$ pypgx train-cnv-caller -h
10071016
usage: pypgx train-cnv-caller [-h] [--confusion-matrix PATH]
1017+
[--comparison-table PATH]
10081018
copy-number cnv-calls cnv-caller
10091019
10101020
Train CNV caller for target gene.
@@ -1025,4 +1035,7 @@ train-cnv-caller
10251035
Write the confusion matrix as a CSV file where rows
10261036
indicate actual class and columns indicate prediction
10271037
class.
1038+
--comparison-table PATH
1039+
Write a CSV file comparing actual vs. predicted CNV
1040+
calls for each sample.
10281041

docs/create.py

+31-8
Original file line numberDiff line numberDiff line change
@@ -53,18 +53,34 @@
5353
Introduction
5454
============
5555
56-
The main purpose of the PyPGx package, which is completely free and open
57-
source, is to provide a unified platform for pharmacogenomics (PGx) research.
56+
The main purpose of the PyPGx package is to provide a unified platform for
57+
pharmacogenomics (PGx) research. PyPGx is and always will be completely free
58+
and open source.
5859
5960
The package is written in Python, and supports both command line interface
6061
(CLI) and application programming interface (API) whose documentations are
6162
available at the `Read the Docs <https://pypgx.readthedocs.io/en/latest/>`_.
6263
63-
PyPGx can be used to predict PGx genotypes and phenotypes using various
64-
genomic data, including data from next-generation sequencing (NGS), single
65-
nucleotide polymorphism (SNP) array, and long-read sequencing. Importantly,
66-
PyPGx is compatible with both of the Genome Reference Consortium Human (GRCh)
67-
builds, GRCh37 (hg19) and GRCh38 (hg38).
64+
Quick links:
65+
66+
- `README <https://pypgx.readthedocs.io/en/latest/readme.html>`__
67+
- `Genes <https://pypgx.readthedocs.io/en/latest/genes.html>`__
68+
- `Glossary <https://pypgx.readthedocs.io/en/latest/glossary.html>`__
69+
- `Tutorials <https://pypgx.readthedocs.io/en/latest/tutorials.html>`__
70+
- `CLI <https://pypgx.readthedocs.io/en/latest/cli.html>`__
71+
- `API <https://pypgx.readthedocs.io/en/latest/api.html>`__
72+
- `SDK <https://pypgx.readthedocs.io/en/latest/sdk.html>`__
73+
- `FAQ <https://pypgx.readthedocs.io/en/latest/faq.html>`__
74+
- `Changelog <https://pypgx.readthedocs.io/en/latest/changelog.html>`__
75+
76+
PyPGx can predict PGx genotypes (e.g. ``*4/*5``) and phenotypes (e.g.
77+
``Poor Metabolizer``) using various genomic data, including data from
78+
next-generation sequencing (NGS), single nucleotide polymorphism (SNP) array,
79+
and long-read sequencing. Importantly, for NGS data the package can detect
80+
`structural variation (SV) <https://pypgx.readthedocs.io/en/latest/
81+
glossary.html#structural-variation-sv>`__ using a machine learning-based
82+
approach. Finally, note that PyPGx is compatible with both of the Genome
83+
Reference Consortium Human (GRCh) builds, GRCh37 (hg19) and GRCh38 (hg38).
6884
6985
There are currently 59 pharmacogenes in PyPGx:
7086
@@ -188,6 +204,10 @@
188204
you do this, please make sure your environment already has all the
189205
dependencies installed.
190206
207+
.. warning::
208+
You're not done yet! Keep scrolling down to obtain the resource bundle
209+
for PyPGx, which is essential for running the package.
210+
191211
Resource bundle
192212
===============
193213
@@ -227,7 +247,10 @@
227247
classifier using the `one-vs-rest strategy <https://scikit-learn.org/stable
228248
/modules/generated/sklearn.multiclass.OneVsRestClassifier.html>`__ for each
229249
gene for each GRCh build. Each classifier is trained using copy number
230-
profiles of real NGS samples as well as simulated ones.
250+
profiles of real NGS samples as well as simulated ones, including those from
251+
`1KGP <https://pypgx.readthedocs.io/en/latest/glossary.html#genomes-project-
252+
1kgp>`__ and `GeT-RM <https://pypgx.readthedocs.io/en/latest/
253+
glossary.html#genetic-testing-reference-materials-coordination-program-get-rm>`__.
231254
232255
You can plot copy number profile and allele fraction profile with PyPGx to
233256
visually inspect SV calls. Below are CYP2D6 examples:

0 commit comments

Comments
 (0)