Allele Mismatches & Strand Flips #68
alkaZeltser
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Allele Matching
The first step in applying a PGS to genetic data is to match up the variant records between the PGS weight data and the genetic data (such as a VCF file).
ApplyPolygenicScoredoes this by matching variant coordinates (chromosome and base pair). A common subsequent quality check is to verify whether the alleles associated with each variant also match between the two datasets. By convention, the expectation is that the effect allele from the PGS weight data should match the alternative (ALT) allele in the VCF record, and the other (non-effect) PGS weight allele should match the reference (REF) allele in the VCF record. For example, the ideal case is shown here:Most of the time, these allele pairs will match up, but there are scenarios where that is not the case. A possible consequence of a mismatched set of alleles is an incorrect computation of the effect allele dosage, resulting in an incorrect final PGS sum.
Effect switch
The most common cause of an other/REF & effect/ALT allele mismatch is when the labeling of the effect allele breaks convention, assigning the effect weight to the REF allele instead.
The
apply.polygenic.scorefunction in our package does not assume that only the ALT allele can be an effect allele. A "risk" dosage for each variant is computed relative to theeffect_alleleas labeled in the PGS weight data, and no special action is required to correct for a potential effect label switch.Strand flip
Another scenario results from a strand flip. "Strand" refers to the fact that the DNA molecule structure is composed of two strands. The base sequences along the two strands are each other's reverse complement. Bases on one strand are always paired with the same base on the reverse strand. These are the pairings:
When variant calling or genotyping is performed, variants are called relative to a reference DNA sequence, and a decision must be made as to which of the two strands should be used as the reference. By convention, most modern studies call variants relative to the forward strand. However, early genotyping arrays would sometimes include probes for the reverse strand, often because of superior performance. If the PGS was developed on data that included a “flipped” variant site, but the VCF data is consistently called against the forward strand, the result is an allele mismatch:
This scenario is simple to check for and correct. If an allele mismatch is detected, all that is necessary is to flip one set of alleles to their opposites, and verify the match once more. If the flipped alleles now match between the two datasets, the strand flip has been resolved.
ApplyPolygenicScoreprovides theassess.pgs.vcf.allele.matchfunction which will perform a strand check and correctional flip for any two pairs of alleles. This same function is used internally byapply.polygenic.scoreto optionally perform the same action after the PGS/VCF merge step and before the dosage computation step.There are several caveats to this strategy.
other_alleleandeffect_allelein the PGS weight data are required for a strand flip check. Unfortunately, the other allele is not always reported in PGS weight files, and is not a requirement of PGS Catalog submisisons.assess.pgs.vcf.allele.match). This is because VCF convention reports INDEL alleles relative to a downstream base. Since the downstream direction will differ by DNA strand, it is impossible to recover some of the INDEL allele sequence for a proper comparison.assess.pgs.vcf.allele.matchandapply.polygenic.scorecannot resolve these scenarios, but they do provide options for what to do with an ambiguous allele match: leave as is, or remove the variant from PGS application.Beta Was this translation helpful? Give feedback.
All reactions