You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stop('pgs.weight.data must contain columns named CHROM, POS, effect_allele, and beta');
33
33
}
34
34
35
+
# additional required columns if strand flip correction is enabled
36
+
if (correct.strand.flips||remove.ambiguous.allele.matches||remove.mismatched.indels) {
37
+
if (!('other_allele'%in% colnames(pgs.weight.data))) {
38
+
stop('pgs.weight.data must contain a column named other_allele if correct.strand.flips, remove.ambiguous.allele.matches, or remove.mismatched.indels is TRUE');
39
+
}
40
+
}
41
+
35
42
if (use.external.effect.allele.frequency) {
36
43
required.eaf.column<-'allelefrequency_effect';
37
44
if (!(required.eaf.column%in% colnames(pgs.weight.data))) {
#' Phenotype variables are automatically classified as continuous, binary, or neither based on data type and number of unique values. The calculated PGS is associated
90
97
#' with each phenotype variable using linear or logistic regression for continuous or binary phenotypes, respectively. See \code{run.pgs.regression} for more details.
91
98
#' If no phenotype.analysis.columns are provided, no regression analysis is performed.
99
+
#' @param correct.strand.flips A logical indicating whether to check PGS weight data/VCF genotype data matches for strand flips and correct them. Default is \code{TRUE}.
100
+
#' The PGS catalog standard column \code{other_allele} in \code{pgs.weight.data} is required for this check.
101
+
#' @param remove.ambiguous.allele.matches A logical indicating whether to remove PGS variants with ambiguous allele matches between PGS weight data and VCF genotype data. Default is \code{FALSE}.
102
+
#' The PGS catalog standard column \code{other_allele} in \code{pgs.weight.data} is required for this check.
103
+
#' @param remove.mismatched.indels A logical indicating whether to remove indel variants that are mismatched between PGS weight data and VCF genotype data. Default is \code{FALSE}.
104
+
#' The PGS catalog standard column \code{other_allele} in \code{pgs.weight.data} is required for this check.
92
105
#' @param output.dir A character string indicating the directory to write output files. Separate files are written for per-sample pgs results and optional regression results.
93
106
#' Files are tab-separate .txt files. Default is NULL in which case no files are written.
94
107
#' @param file.prefix A character string to prepend to the output file names. Default is \code{NULL}.
#' VCF genotype data are matched to PGS data by chromosome, position, and effect allele. If a SNP cannot be matched by genomic coordinate,
160
+
#' VCF genotype data are matched to PGS data by chromosomeand position. If a SNP cannot be matched by genomic coordinate,
147
161
#' an attempt is made to match by rsID (if available). If a SNP from the PGS weight data is not found in the VCF data after these two matching attempts,
148
162
#' it is considered a cohort-wide missing variant.
#' It is assumed that multiallelic variants are encoded in the same row in the VCF data. This is known as "merged" format. Split multiallelic sites are not accepted.
172
186
#' VCF data can be formatted to merged format using external tools for VCF file manipulation.
173
187
#'
188
+
#' \strong{Allele Mismatch Handling}
189
+
#' Variants from the PGS weight data are merged with records in the VCF data by genetic coordinate.
190
+
#' After the merge is complete, there may be cases where the VCF reference (REF) and alternative (ALT) alleles do not match their conventional counterparts in the
191
+
#' PGS weight data (other allele and effect allele, respectively).
192
+
#' This is usually caused by a strand flip: the variant in question was called against opposite DNA reference strands in the PGS training data and the VCF data.
193
+
#' Strand flips can be detected and corrected by flipping the affected allele to its reverse complement.
194
+
#' \code{apply.polygenic.score} uses \code{assess.pgs.vcf.allele.match} to assess allele concordance, and is controlled through the following arguments:
195
+
#'
196
+
#' \itemize{
197
+
#' \item \code{correct.strand.flips}: When \code{TRUE}, detected strand flips are corrected by flipping the affected value in the \code{effect_allele} column prior to dosage calling.
198
+
#' \item \code{remove.ambiguous.allele.matches}: Corresponds to the \code{return.ambiguous.as.missing} argument in \code{assess.pgs.vcf.allele.match}. When \code{TRUE}, non-INDEL allele
199
+
#' mismatches that cannot be resolved (due to palindromic alleles or causes other than strand flips) are removed by marking the affected value in the \code{effect_allele} column as missing
200
+
#' prior to dosage calling and missing genotype handling. The corresponding dosage is set to NA and the variant is handled according to the chosen missing genotype method.
201
+
#' \item \code{remove.mismatched.indels}: Corresponds to the \code{return.indels.as.missing} argument in \code{assess.pgs.vcf.allele.match}. When \code{TRUE}, INDEL allele mismatches
202
+
#' (which cannot be assessed for strand flips) are removed by marking the affected value in the \code{effect_allele} column as missing prior to dosage calling and missing genotype handling.
203
+
#' The corresponding dosage is set to NA and the variant is handled according to the chosen missing genotype method.
204
+
#' }
205
+
#'
206
+
#' Note that an allele match assessment requires the presence of both the \code{other_allele} and \code{effect_allele} in the PGS weight data.
207
+
#' The \code{other_allele} column is not required by the PGS Catalog, and so is not always available.
0 commit comments