Skip to content

Commit 79ec441

Browse files
authored
Merge pull request #67 from sbslee/0.17.0-dev
0.17.0 dev
2 parents 74d5c14 + 05a8998 commit 79ec441

File tree

10 files changed

+259
-34
lines changed

10 files changed

+259
-34
lines changed

CHANGELOG.rst

+9-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
11
Changelog
22
*********
33

4+
0.17.0 (2022-07-12)
5+
-------------------
6+
7+
* :issue:`63`: Fix bug in :meth:`api.utils.estimate_phase_beagle` when there is only one variant in input VCF and Beagle throws an error.
8+
* Update :command:`compare-genotypes` command to print the entire discordant calls when ``--verbose`` is used.
9+
* Update :command:`compute-copy-number` command to ensure that the samples in CovFrame[ReadDepth] and SampleTable[Statistics] are in the same order.
10+
* :issue:`64`: Update :meth:`api.utils.import_variants` method to 'diploidize' the input VCF when the target gene is G6PD. This is because some variant callers output haploid genotypes for males for the X chromosome, interfering with downstream analyses.
11+
* Remove unnecessary optional argument ``assembly`` from :meth:`api.core.get_ref_allele`.
12+
413
0.16.0 (2022-06-08)
514
-------------------
615

@@ -71,7 +80,6 @@ Changelog
7180
* Deprecate :meth:`sdk.utils.parse_input_bams` method.
7281
* Update :meth:`api.utils.predict_alleles` method to match ``0.31.0`` version of ``fuc`` package.
7382
* Fix bug in :command:`filter-samples` command when ``--exclude`` argument is used for archive files with SampleTable type.
74-
* Remove unnecessary optional argument ``assembly`` from :meth:`api.core.get_ref_allele`.
7583
* Improve CNV caller for CYP2A6, CYP2B6, CYP2D6, CYP2E1, CYP4F2, GSTM1, SLC22A2, SULT1A1, UGT1A4, UGT2B15, and UGT2B17.
7684
* Add a new CNV call for CYP2D6: ``PseudogeneDeletion``.
7785
* In CYP2E1 CNV nomenclature, ``PartialDuplication`` has been renamed to ``PartialDuplicationHet`` and a new CNV call ``PartialDuplicationHom`` has been added. Furthermore, calling algorithm for CYP2E1\*S1 allele has been updated. When partial duplication is present, from now on the algorithm requires only \*7 to call \*S1 instead of both \*7 and \*4.

README.rst

+17-3
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,15 @@ you can access a development branch with the ``git checkout`` command. When
177177
you do this, please make sure your environment already has all the
178178
dependencies installed.
179179

180+
.. note::
181+
`Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>`__
182+
is one of the default software tools used by PyPGx for haplotype phasing
183+
SNVs and indels. The program is freely available and published under the
184+
`GNU General Public License <https://faculty.washington.edu/browning/
185+
beagle/gpl_license>`__. Users do not need to download Beagle separately
186+
because a copy of the software (``beagle.28Jun21.220.jar``) is already
187+
included in PyPGx.
188+
180189
.. warning::
181190
You're not done yet! Keep scrolling down to obtain the resource bundle
182191
for PyPGx, which is essential for running the package.
@@ -238,13 +247,13 @@ visually inspect SV calls. Below are CYP2D6 examples:
238247
* - Normal
239248
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-1.png
240249
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-8.png
241-
* - DeletionHet
250+
* - WholeDel1
242251
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-2.png
243252
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-1.png
244-
* - DeletionHom
253+
* - WholeDel1Hom
245254
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-3.png
246255
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-6.png
247-
* - Duplication
256+
* - WholeDup1
248257
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-4.png
249258
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-2.png
250259
* - Tandem3
@@ -254,6 +263,11 @@ visually inspect SV calls. Below are CYP2D6 examples:
254263
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-10.png
255264
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-7.png
256265

266+
PyPGx was recently applied to the entire high-coverage WGS dataset from 1KGP
267+
(N=2,504). Click `here <https://github.com/sbslee/1kgp-pgx-paper/tree/main/
268+
sv-tables>`__ to see individual SV calls, and corresponding copy number
269+
profiles and allele fraction profiles.
270+
257271
GRCh37 vs. GRCh38
258272
=================
259273

docs/create.py

+17-3
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,15 @@
204204
you do this, please make sure your environment already has all the
205205
dependencies installed.
206206
207+
.. note::
208+
`Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>`__
209+
is one of the default software tools used by PyPGx for haplotype phasing
210+
SNVs and indels. The program is freely available and published under the
211+
`GNU General Public License <https://faculty.washington.edu/browning/
212+
beagle/gpl_license>`__. Users do not need to download Beagle separately
213+
because a copy of the software (``beagle.28Jun21.220.jar``) is already
214+
included in PyPGx.
215+
207216
.. warning::
208217
You're not done yet! Keep scrolling down to obtain the resource bundle
209218
for PyPGx, which is essential for running the package.
@@ -265,13 +274,13 @@
265274
* - Normal
266275
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-1.png
267276
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-8.png
268-
* - DeletionHet
277+
* - WholeDel1
269278
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-2.png
270279
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-1.png
271-
* - DeletionHom
280+
* - WholeDel1Hom
272281
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-3.png
273282
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-6.png
274-
* - Duplication
283+
* - WholeDup1
275284
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-4.png
276285
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-2.png
277286
* - Tandem3
@@ -281,6 +290,11 @@
281290
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-10.png
282291
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-7.png
283292
293+
PyPGx was recently applied to the entire high-coverage WGS dataset from 1KGP
294+
(N=2,504). Click `here <https://github.com/sbslee/1kgp-pgx-paper/tree/main/
295+
sv-tables>`__ to see individual SV calls, and corresponding copy number
296+
profiles and allele fraction profiles.
297+
284298
GRCh37 vs. GRCh38
285299
=================
286300

docs/faq.rst

+46
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,49 @@ consistent with the other variant-level analyses you may also just use the
7777
same VCF for PyPGx. The bottom line is, if you are going to create your own
7878
input VCF, then you need to know what you are doing. Otherwise, it's probably
7979
safer to use :command:`create-input-vcf`.
80+
81+
``chr22_KI270879v1_alt`` in GRCh38
82+
==================================
83+
84+
Users may encounter an error like below when working with GRCh38 data:
85+
86+
.. code-block:: text
87+
88+
$ pypgx prepare-depth-of-coverage \
89+
depth-of-coverage.zip \
90+
in.bam \
91+
--assembly GRCh38
92+
Traceback (most recent call last):
93+
File "/Users/sbslee/opt/anaconda3/envs/fuc/bin/pypgx", line 33, in <module>
94+
sys.exit(load_entry_point('pypgx', 'console_scripts', 'pypgx')())
95+
File "/Users/sbslee/Desktop/pypgx/pypgx/__main__.py", line 33, in main
96+
commands[args.command].main(args)
97+
File "/Users/sbslee/Desktop/pypgx/pypgx/cli/prepare_depth_of_coverage.py", line 90, in main
98+
archive = utils.prepare_depth_of_coverage(
99+
File "/Users/sbslee/Desktop/pypgx/pypgx/api/utils.py", line 1247, in prepare_depth_of_coverage
100+
cf = pycov.CovFrame.from_bam(bams, regions=regions, zero=True)
101+
File "/Users/sbslee/Desktop/fuc/fuc/api/pycov.py", line 345, in from_bam
102+
results += pysam.depth(*(bams + args + ['-r', region]))
103+
File "/Users/sbslee/opt/anaconda3/envs/fuc/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
104+
raise SamtoolsError(
105+
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools depth: cannot parse region "chr22_KI270879v1_alt:267307-281486"\n'
106+
107+
This is a GRCh38-specific issue. One of the genes with SV is GSTT1 and it is
108+
located in the contig ``chr22_KI270879v1_alt``, which is missing in input BAM
109+
file. That's why the :command:`prepare-depth-of-coverage` command is
110+
complaining. To solve this issue, you can either re-align sequence reads in
111+
the presence of the contig in your FASTA reference genome or work around it
112+
by excluding GSTT1 from your analysis:
113+
114+
.. code-block:: text
115+
116+
$ pypgx prepare-depth-of-coverage \
117+
depth-of-coverage.zip \
118+
in.bam \
119+
--assembly GRCh38 \
120+
--genes GSTT1 \
121+
--exclude
122+
123+
For more details, please see the following articles:
124+
:ref:`readme:GRCh37 vs. GRCh38` and :ref:`genes:GRCh38 data for GSTT1`.
125+
Related GitHub issues: :issue:`65`.

0 commit comments

Comments
 (0)