Skip to content

Commit 1922324

Browse files
authored
Merge pull request #73 from uclahs-cds/nzeltser-add-CRAN
add cran to readme
2 parents bd8b1a4 + 6f44505 commit 1922324

File tree

4 files changed

+29
-13
lines changed

4 files changed

+29
-13
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: ApplyPolygenicScore
22
Type: Package
33
Title: Utilities for the Application of a Polygenic Score to a VCF
4-
Version: 3.0.1
4+
Version: 3.0.2
55
Authors@R: c(
66
person('Paul', 'Boutros', role = 'cre', email = '[email protected]'),
77
person('Nicole', 'Zeltser', role = 'aut', comment = c(ORCID = '0000-0001-7246-2771')),

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Unreleased
22

3+
# ApplyPolygenicScore 3.0.2
4+
5+
## Changed
6+
* ApplyPolygenicScore released on CRAN! Updated README with CRAN links.
7+
38
# ApplyPolygenicScore 3.0.1
49

510
## Added

README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,21 @@
99

1010

1111
## Description
12-
This R package provides a set of utilities to simply and transparently parse genotype/dosage data from an input VCF, match genotype coordinates to the component SNPs of an existing polygenic score, and apply SNP weights to dosages to calculate a polygenic score for each individual in accordance with the additive weighted sum of dosages model.
12+
This R package provides a set of utilities to simply and transparently parse genotype/dosage data from an input VCF, match genotype coordinates to the component SNPs of an existing polygenic score model, and apply SNP weights to dosages to calculate a polygenic score for each individual in accordance with the additive weighted sum of dosages method.
1313

1414
## Installation
15+
16+
To install the last release on CRAN:
17+
18+
```
19+
# In an R session
20+
install.packages("ApplyPolygenicScore")
21+
```
22+
1523
To install the latest development version from GitHub:
1624

1725
```
26+
# In an R session
1827
# install.packages("devtools")
1928
2029
devtools::install_github("uclahs-cds/package-ApplyPolygenicScore")
@@ -40,7 +49,7 @@ You will need only two pieces of data to get started:
4049
- Others have done a great job of describing Variant Call Format. For those with a basic understanding of genetic nomenclature, we recommend the GATK [resource](https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format).
4150
- For those who need a refresher on genomics and genomic data, we recommend starting with the [fact sheets](https://www.genome.gov/about-genomics/fact-sheets) curated by the National Human Genome Research Institute (NHGRI).
4251

43-
If you wish to apply a PGS to a cohort, we recommend that genotypes for the whole cohort be aggregated in one VCF file, either through a regenotyping process, or through VCF merging with an external tool designed for manipulating VCF files. VCF files can be very large, causing memory-related complications in the R environment. To reduce memory usage and improve speed of PGS application, we recommend pre-filtering the input VCF for only the coordinates that compose the PGS you wish to apply. This action can be performed using a coordinate BED file and tools such as bcftools or bedtools. To facilitate this process, ApplyPolygenicScore provides a function that outputs a BED file containing coordinates for any number of PGS weight files provided as input.
52+
If you wish to apply a PGS to a cohort, we recommend that genotypes for the whole cohort be aggregated in one VCF file, either through a regenotyping process, or through VCF merging with an external tool designed for manipulating VCF files. VCF files can be very large, causing memory-related complications in the R environment. To reduce memory usage and improve speed of PGS application, we recommend pre-filtering the input VCF for only the coordinates that compose the PGS you wish to apply. This action can be performed using a genomic coordinate file in BED format and tools such as bcftools or bedtools. To facilitate this process, ApplyPolygenicScore provides a function that outputs a BED-formatted file containing genomic coordinates for any number of PGS weight files provided as input.
4453

4554
#### PGS weight file
4655
- The PGS weight file describes a PGS by providing a list of component SNPs, their genomic coordinates, and their respective weights.
@@ -61,9 +70,9 @@ If you wish to apply a PGS to a cohort, we recommend that genotypes for the whol
6170
### Recommended Workflow
6271

6372

64-
1. Convert PGS weight files to BED coordinate files.
73+
1. Convert PGS weight files to BED-formatted coordinate files.
6574

66-
We recommend starting by filtering your input VCF for just the variants in your PGS weight files. Several software tools are available to do this, and most all require a coordinate BED file. A description of BED format can be found [here](https://bedtools.readthedocs.io/en/latest/content/general-usage.html).
75+
We recommend starting by filtering your input VCF for just the variants in your PGS weight files. Several software tools are available to do this, and most all require a coordinate file in BED format. A description of BED format can be found [here](https://bedtools.readthedocs.io/en/latest/content/general-usage.html).
6776

6877
The function `import.pgs.weight.file` can be used to import your PGS weight files into R.
6978
The functions `convert.pgs.to.bed` and `combine.pgs.bed` can be used to make the conversion, and merge several BED dataframes into one, respectively.
@@ -83,10 +92,12 @@ If you wish to apply a PGS to a cohort, we recommend that genotypes for the whol
8392

8493
ApplyPolygenicScore comes with several plotting functions designed to operate on the results of `apply.polygenic.score`. Display PGS density curves with `create.pgs.density.plot` and PGS percentile ranks with `create.pgs.rank.plot`. If you provided phenotype data in step 3, you can incorporate categorical data into the density plots and categorical and continuous phenotype data into the rank plots, and use `create.pgs.with.continuous.phenotype.plot` to make scatterplots of your PGS against any continuous phenotype data.
8594

86-
For more step-by-step instructions, check out our vignettes.
95+
For more step-by-step instructions, check out our [vignettes](https://CRAN.R-project.org/package=ApplyPolygenicScore).
8796

8897
## Resources
89-
This package is in the process of being submitted to CRAN, where the manual and vignettes will be readily available. In the meantime, if you have installed the package from GitHub with `build_vignettes = TRUE`, you may view the vignette by running the following:
98+
This package is hosted on CRAN. The manual of all functions and the User Guide vignette can be accessed on the [ApplyPolygenicScore CRAN page](https://CRAN.R-project.org/package=ApplyPolygenicScore).
99+
100+
If you have installed the package from GitHub with `build_vignettes = TRUE`, you may view the vignette by running the following:
90101

91102
```
92103
vignette('UserGuide', package = 'ApplyPolygenicScore')

vignettes/UserGuide.Rmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -116,14 +116,14 @@ phenotype.data <- data.frame(
116116
head(phenotype.data)
117117
```
118118

119-
## Creating a BED coordinate file
119+
## Creating a BED-formatted coordinate file
120120

121121
VCF files can be very large. Sometimes they are too large to be imported into R. In these cases, it is useful to first filter the VCF file to just the variants
122122
that are included in the PGS you wish to calculate and reduce file size. This is best done using command line tools designed for VCF file manipulation. For filtering, they typically
123123
require a BED file containing the coordinates of the variants you wish to keep. To simplify this process, ApplyPolygenicScore provides functions for converting PGS weight files
124-
to BED coordinate files.
124+
to BED-formatted coordinate files.
125125

126-
### Conversion of PGS weight files to BED coordinate format
126+
### Conversion of PGS weight files to a coordinate file in BED format
127127

128128
BED format requires the following first three columns: chromosome name, start position, and end position.
129129
PGS weight files only contain the chromosome name and end position of each variant, so must be reformatted
@@ -133,7 +133,7 @@ with an additional column for the start position, and with the correct column or
133133
Additionally, most tools do not accept BED files with column names. If you wish to maintain a header, you may need to add
134134
a comment character to the first line of the file: `# chr start end`
135135

136-
Use the `convert.pgs.to.bed` function to convert a PGS weight file to a BED coordinate data frame.
136+
Use the `convert.pgs.to.bed` function to convert a PGS weight file to a BED-formatted coordinate data frame.
137137

138138
```{r convert-pgs-to-bed}
139139
@@ -156,7 +156,7 @@ format the X and Y chromosomes as 'X' and 'Y' respectively, and `numeric.sex.chr
156156

157157
The `slop` option imitates `bedtools` nomenclature for adding base pairs to the start and end of a set of coordinates. `slop = 10` adds 10 base pairs to the start and end of each variant coordinate.
158158

159-
Here is an example of BED coordinates for a variant on chromosome 1 at the 20th base pair.
159+
Here is an example of genomic coordinates in BED file format for a variant on chromosome 1 at the 20th base pair.
160160

161161
No slop:
162162

@@ -173,7 +173,7 @@ With slop of 10 base pairs:
173173
### Merging coordinates from multiple polygenic scores
174174

175175
What if you want to apply multiple polygenic scores to the same VCF file?
176-
Instead of filtering the VCF file multiple times, you can use the `combine.pgs.bed` function to merge multiple BED data frames
176+
Instead of filtering the VCF file multiple times, you can use the `combine.pgs.bed` function to merge multiple BED-formatted data frames
177177
into a single set of coordinates, and filter your VCF just once for the union of all variants in multiple PGSs.
178178

179179
```{r merge-pgs-bed}

0 commit comments

Comments
 (0)