Skip to content

Commit 40f33d0

Browse files
authored
Merge pull request #103 from apriha/develop
v4.2.0
2 parents 3be7a99 + 3afe893 commit 40f33d0

9 files changed

+98
-115
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ jobs:
6666
steps:
6767
- uses: actions/checkout@v2
6868
with:
69+
fetch-depth: 0
6970
persist-credentials: false
7071
- name: Setup Python ${{ matrix.python-version }}
7172
uses: actions/setup-python@v2

README.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -152,8 +152,8 @@ calculating the centiMorgans of shared DNA and plotting the results:
152152
>>> results = l.find_shared_dna([user662, user663], cM_threshold=0.75, snp_threshold=1100)
153153
Downloading resources/genetic_map_HapMapII_GRCh37.tar.gz
154154
Downloading resources/cytoBand_hg19.txt.gz
155-
Saving output/shared_dna_User662_User663_HapMap2.png
156-
Saving output/shared_dna_one_chrom_User662_User663_GRCh37_HapMap2.csv
155+
Saving output/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png
156+
Saving output/shared_dna_one_chrom_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.csv
157157

158158
Notice that the centiMorgan and SNP thresholds for each DNA segment can be tuned. Additionally,
159159
notice that two files were downloaded to facilitate the analysis and plotting - future analyses
@@ -178,7 +178,7 @@ created; these files are detailed in the documentation and their generation can
178178
``save_output=False`` argument. In this example, the output files consist of a CSV file that
179179
details the shared segments of DNA on one chromosome and a plot that illustrates the shared DNA:
180180

181-
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_HapMap2.png
181+
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png
182182

183183
`Find Shared Genes <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_shared_dna>`_
184184
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -214,11 +214,11 @@ Now let's find the shared genes, specifying a
214214
Downloading resources/CEU_omni_recombination_20130507.tar
215215
Downloading resources/knownGene_hg19.txt.gz
216216
Downloading resources/kgXref_hg19.txt.gz
217-
Saving output/shared_dna_User4583_User4584_CEU.png
218-
Saving output/shared_dna_one_chrom_User4583_User4584_GRCh37_CEU.csv
219-
Saving output/shared_dna_two_chroms_User4583_User4584_GRCh37_CEU.csv
220-
Saving output/shared_genes_one_chrom_User4583_User4584_GRCh37_CEU.csv
221-
Saving output/shared_genes_two_chroms_User4583_User4584_GRCh37_CEU.csv
217+
Saving output/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png
218+
Saving output/shared_dna_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
219+
Saving output/shared_dna_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
220+
Saving output/shared_genes_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
221+
Saving output/shared_genes_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
222222

223223
The plot that illustrates the shared DNA is shown below. Note that in addition to outputting the
224224
shared DNA segments on either one or both chromosomes, the shared genes on either one or both
@@ -235,7 +235,7 @@ of shared DNA:
235235
>>> len(results['two_chrom_shared_dna'])
236236
36
237237

238-
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_CEU.png
238+
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png
239239

240240
Documentation
241241
-------------

docs/images/lineage_banner.png

42.7 KB
Loading

docs/output_files.rst

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -53,30 +53,37 @@ Shared DNA between two or more individuals can be identified with
5353
:meth:`~lineage.Lineage.find_shared_dna`. One PNG file and up to two CSV files are output when
5454
``save_output=True``.
5555

56-
In the filenames below, ``name1`` is the name of the first
57-
:class:`~lineage.individual.Individual` and ``name2`` is the name of the second
58-
:class:`~lineage.individual.Individual`. (If more individuals are compared, all
59-
:class:`~lineage.individual.Individual` names will be included in the filenames and plot titles
60-
using the same conventions.) Additionally, ``genetic_map`` corresponds to the genetic map used
61-
in the calculations of shared DNA, specified as a parameter to :meth:`~lineage.Lineage.find_shared_dna`.
56+
In the filenames below,
57+
58+
- ``name1`` is the name of the first :class:`~lineage.individual.Individual`
59+
- ``name2`` is the name of the second :class:`~lineage.individual.Individual`
60+
- ``cM_threshold`` corresponds to the same named parameter of
61+
:meth:`~lineage.Lineage.find_shared_dna`; "." is replaced by "p" with precision of 2, e.g., "0p75"
62+
- ``snp_threshold`` corresponds to the same named parameter of
63+
:meth:`~lineage.Lineage.find_shared_dna`
64+
- ``genetic_map`` corresponds to the same named parameter of
65+
:meth:`~lineage.Lineage.find_shared_dna`.
66+
67+
.. note:: If more than two individuals are compared, all :class:`~lineage.individual.Individual`
68+
names will be included in the filenames and plot titles using the same conventions.
6269

6370
.. note:: Genetic maps do not have recombination rates for the Y chromosome since the Y
6471
chromosome does not recombine. Therefore, shared DNA will not be shown on the Y
6572
chromosome.
6673

67-
shared_dna_<name1>_<name2>_<genetic_map>.png
68-
````````````````````````````````````````````
74+
shared_dna_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.png
75+
````````````````````````````````````````````````````````````````````````````````````````
6976
This plot illustrates shared DNA (i.e., no shared DNA, shared DNA on one chromosome, and shared
7077
DNA on both chromosomes). The centromere for each chromosome is also detailed. Two examples of
7178
this plot are shown below.
7279

73-
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_HapMap2.png
80+
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png
7481

7582
In the above plot, note that the two individuals only share DNA on one chromosome. In this plot,
7683
the larger regions where "No shared DNA" is indicated are due to SNPs not being available in
7784
those regions (i.e., SNPs were not tested in those regions).
7885

79-
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_CEU.png
86+
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png
8087

8188
In the above plot, the areas where "No shared DNA" is indicated are the regions where SNPs were
8289
not tested or where DNA is not shared. The areas where "One chromosome shared" is indicated are
@@ -86,8 +93,8 @@ shared" is indicated are regions where the individuals share DNA on both chromos
8693
Note that the regions where DNA is shared on both chromosomes is a subset of the regions where
8794
one chromosome is shared.
8895

89-
shared_dna_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv
90-
`````````````````````````````````````````````````````````````
96+
shared_dna_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
97+
``````````````````````````````````````````````````````````````````````````````````````````````````
9198
If DNA is shared on one chromosome, a CSV file details the shared segments of DNA.
9299

93100
======= ===========
@@ -101,8 +108,8 @@ cMs CentiMorgans of matching DNA segment
101108
snps Number of SNPs in matching DNA segment
102109
======= ===========
103110

104-
shared_dna_two_chroms_<name1>_<name2>_GRCh37_<genetic_map>.csv
105-
``````````````````````````````````````````````````````````````
111+
shared_dna_two_chroms_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
112+
```````````````````````````````````````````````````````````````````````````````````````````````````
106113
If DNA is shared on two chromosomes, a CSV file details the shared segments of DNA.
107114

108115
======= ===========
@@ -129,11 +136,11 @@ In the filenames below, ``name1`` is the name of the first
129136
:class:`~lineage.individual.Individual` names will be included in the filenames using the same
130137
convention.)
131138

132-
shared_genes_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv
133-
```````````````````````````````````````````````````````````````
139+
shared_genes_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
140+
````````````````````````````````````````````````````````````````````````````````````````````````````
134141
If DNA is shared on one chromosome, this file details the genes shared between the individuals
135142
on at least one chromosome; these genes are located in the shared DNA segments specified in
136-
`shared_dna_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv`_.
143+
`shared_dna_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv`_.
137144

138145
=========== ============
139146
Column* Description*
@@ -152,10 +159,10 @@ description Description
152159
\* `UCSC Genome Browser <http://genome.ucsc.edu>`_ /
153160
`UCSC Table Browser <http://genome.ucsc.edu/cgi-bin/hgTables>`_
154161

155-
shared_genes_two_chroms_<name1>_<name2>_GRCh37_<genetic_map>.csv
156-
````````````````````````````````````````````````````````````````
162+
shared_genes_two_chroms_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
163+
`````````````````````````````````````````````````````````````````````````````````````````````````````
157164
If DNA is shared on both chromosomes in a pair, this file details the genes shared between the
158165
individuals on both chromosomes; these genes are located in the shared DNA segments specified in
159-
`shared_dna_two_chroms_<name1>_<name2>_GRCh37_<genetic_map>.csv`_.
166+
`shared_dna_two_chroms_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv`_.
160167

161-
The file has the same columns as `shared_genes_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv`_.
168+
The file has the same columns as `shared_genes_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv`_.

src/lineage/__init__.py

Lines changed: 33 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -500,6 +500,8 @@ def find_shared_dna(
500500
if save_output:
501501
self._find_shared_dna_output_helper(
502502
individuals,
503+
cM_threshold,
504+
snp_threshold,
503505
one_chrom_shared_dna,
504506
two_chrom_shared_dna,
505507
one_chrom_shared_genes,
@@ -547,12 +549,24 @@ def _find_shared_dna_helper(self, df, cM_threshold, snp_threshold, one_x_chrom):
547549
def _find_shared_dna_output_helper(
548550
self,
549551
individuals,
552+
cM_threshold,
553+
snp_threshold,
550554
one_chrom_shared_dna,
551555
two_chrom_shared_dna,
552556
one_chrom_shared_genes,
553557
two_chrom_shared_genes,
554558
genetic_map,
555559
):
560+
def output_csv(df, file, float_format="%.2f"):
561+
save_df_as_csv(
562+
df,
563+
self._output_dir,
564+
file,
565+
comment=self._get_csv_header(),
566+
prepend_info=False,
567+
float_format=float_format,
568+
)
569+
556570
cytobands = self._resources.get_cytoBand_hg19()
557571

558572
individuals_filename = ""
@@ -565,63 +579,48 @@ def _find_shared_dna_output_helper(
565579
individuals_filename = individuals_filename[:-1]
566580
individuals_plot_title = individuals_plot_title[:-3]
567581

582+
cM = "{:.2f}".format(cM_threshold).replace(".", "p")
583+
filename_details = (
584+
f"{individuals_filename}_{cM}cM_{snp_threshold}snps_GRCh37_{genetic_map}"
585+
)
586+
568587
if create_dir(self._output_dir):
569588
plot_chromosomes(
570589
one_chrom_shared_dna,
571590
two_chrom_shared_dna,
572591
cytobands,
573592
os.path.join(
574593
self._output_dir,
575-
f"shared_dna_{individuals_filename}_{genetic_map}.png",
594+
f"shared_dna_{filename_details}.png",
576595
),
577596
f"{individuals_plot_title} shared DNA",
578597
37,
579598
)
580599

581600
if len(one_chrom_shared_dna) > 0:
582-
file = (
583-
f"shared_dna_one_chrom_{individuals_filename}_GRCh37_{genetic_map}.csv"
584-
)
585-
save_df_as_csv(
601+
output_csv(
586602
one_chrom_shared_dna,
587-
self._output_dir,
588-
file,
589-
comment=self._get_csv_header(),
590-
prepend_info=False,
591-
float_format="%.2f",
603+
f"shared_dna_one_chrom_{filename_details}.csv",
592604
)
593605

594606
if len(two_chrom_shared_dna) > 0:
595-
file = (
596-
f"shared_dna_two_chroms_{individuals_filename}_GRCh37_{genetic_map}.csv"
597-
)
598-
save_df_as_csv(
607+
output_csv(
599608
two_chrom_shared_dna,
600-
self._output_dir,
601-
file,
602-
comment=self._get_csv_header(),
603-
prepend_info=False,
604-
float_format="%.2f",
609+
f"shared_dna_two_chroms_{filename_details}.csv",
605610
)
606611

607612
if len(one_chrom_shared_genes) > 0:
608-
file = f"shared_genes_one_chrom_{individuals_filename}_GRCh37_{genetic_map}.csv"
609-
save_df_as_csv(
613+
output_csv(
610614
one_chrom_shared_genes,
611-
self._output_dir,
612-
file,
613-
comment=self._get_csv_header(),
614-
prepend_info=False,
615+
f"shared_genes_one_chrom_{filename_details}.csv",
616+
None,
615617
)
616618

617619
if len(two_chrom_shared_genes) > 0:
618-
file = f"shared_genes_two_chroms_{individuals_filename}_GRCh37_{genetic_map}.csv"
619-
save_df_as_csv(
620+
output_csv(
620621
two_chrom_shared_genes,
621-
self._output_dir,
622-
file,
623-
comment=self._get_csv_header(),
624-
prepend_info=False,
622+
f"shared_genes_two_chroms_{filename_details}.csv",
623+
None,
625624
)
626625

627626
def _find_shared_dna_return_helper(
@@ -712,7 +711,7 @@ def _compute_snp_distances(self, task):
712711
temp = task["snps"]
713712

714713
# merge genetic map for this chrom
715-
temp = temp.append(genetic_map, ignore_index=False, sort=True)
714+
temp = pd.concat([temp, genetic_map], ignore_index=False, sort=True)
716715

717716
# sort based on pos
718717
temp = temp.sort_values("pos")
@@ -880,8 +879,6 @@ def _remap_snps_to_GRCh37(self, individuals):
880879

881880
def _get_csv_header(self):
882881
return (
883-
"# Generated by lineage v{}, https://pypi.org/project/lineage/\n"
884-
"# Generated at {} UTC\n".format(
885-
__version__, datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
886-
)
882+
f"# Generated by lineage v{__version__}; https://pypi.org/project/lineage/{os.linesep}"
883+
f"# Generated at {datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')} UTC{os.linesep}"
887884
)

src/lineage/visualization.py

Lines changed: 21 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,23 @@ def _patch_chromosomal_features(cytobands, one_chrom_match, two_chrom_match):
235235
the start and stop positions of particular features on each
236236
chromosome
237237
"""
238+
239+
def concat(df, chrom, start, end, gie_stain):
240+
return pd.concat(
241+
[
242+
df,
243+
pd.DataFrame(
244+
{
245+
"chrom": [chrom],
246+
"start": [start],
247+
"end": [end],
248+
"gie_stain": [gie_stain],
249+
}
250+
),
251+
],
252+
ignore_index=True,
253+
)
254+
238255
chromosomes = cytobands["chrom"].unique()
239256

240257
df = pd.DataFrame()
@@ -253,52 +270,20 @@ def _patch_chromosomal_features(cytobands, one_chrom_match, two_chrom_match):
253270
]
254271

255272
# background of chromosome
256-
df = df.append(
257-
{
258-
"chrom": chromosome,
259-
"start": 0,
260-
"end": chromosome_length,
261-
"gie_stain": "gneg",
262-
},
263-
ignore_index=True,
264-
)
273+
df = concat(df, chromosome, 0, chromosome_length, "gneg")
265274

266275
# add markers for shared DNA on one chromosome
267276
for marker in one_chrom_match_markers.itertuples():
268-
df = df.append(
269-
{
270-
"chrom": chromosome,
271-
"start": marker.start,
272-
"end": marker.end,
273-
"gie_stain": "one_chrom",
274-
},
275-
ignore_index=True,
276-
)
277+
df = concat(df, chromosome, marker.start, marker.end, "one_chrom")
277278

278279
# add markers for shared DNA on both chromosomes
279280
for marker in two_chrom_match_markers.itertuples():
280-
df = df.append(
281-
{
282-
"chrom": chromosome,
283-
"start": marker.start,
284-
"end": marker.end,
285-
"gie_stain": "two_chrom",
286-
},
287-
ignore_index=True,
288-
)
281+
df = concat(df, chromosome, marker.start, marker.end, "two_chrom")
289282

290283
# add centromeres
291284
for item in cytobands.loc[
292285
(cytobands["chrom"] == chromosome) & (cytobands["gie_stain"] == "acen")
293286
].itertuples():
294-
df = df.append(
295-
{
296-
"chrom": chromosome,
297-
"start": item.start,
298-
"end": item.end,
299-
"gie_stain": "centromere",
300-
},
301-
ignore_index=True,
302-
)
287+
df = concat(df, chromosome, item.start, item.end, "centromere")
303288

304289
return df

tests/test_lineage.py

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -195,20 +195,13 @@ def _assert_does_not_exist(self, files, idx):
195195
def _make_file_exist_assertions(
196196
self, inds, exist="all", genetic_map="HapMap2", output_dir="output"
197197
):
198+
filename_details = f"{inds}_0p75cM_1100snps_GRCh37_{genetic_map}"
198199
files = [
199-
os.path.join(
200-
output_dir, f"shared_dna_one_chrom_{inds}_GRCh37_{genetic_map}.csv"
201-
),
202-
os.path.join(
203-
output_dir, f"shared_dna_two_chroms_{inds}_GRCh37_{genetic_map}.csv"
204-
),
205-
os.path.join(
206-
output_dir, f"shared_genes_one_chrom_{inds}_GRCh37_{genetic_map}.csv"
207-
),
208-
os.path.join(
209-
output_dir, f"shared_genes_two_chroms_{inds}_GRCh37_{genetic_map}.csv"
210-
),
211-
os.path.join(output_dir, f"shared_dna_{inds}_{genetic_map}.png"),
200+
os.path.join(output_dir, f"shared_dna_one_chrom_{filename_details}.csv"),
201+
os.path.join(output_dir, f"shared_dna_two_chroms_{filename_details}.csv"),
202+
os.path.join(output_dir, f"shared_genes_one_chrom_{filename_details}.csv"),
203+
os.path.join(output_dir, f"shared_genes_two_chroms_{filename_details}.csv"),
204+
os.path.join(output_dir, f"shared_dna_{filename_details}.png"),
212205
]
213206

214207
if exist == "all":

0 commit comments

Comments
 (0)