Skip to content

strand bias artifacts #132

@katestan

Description

@katestan

We have generated targeted NanoSeq data using the human core exome twist panel (~1000X raw coverage, 200X duplex coverage). We are using the same sample as reference with VAF cutoff of 0.1 for somatic variant calling (var_v=0.1). The pipeline completed without errors using these parameters:

// dsa parameters
params.dsa_d = 2
params.dsa_q = 30
params.dsa_M = 0
// variantcaller parameters
params.var_a = 50
params.var_b = 0
params.var_c = 0.02
params.var_d = 2
params.var_f = 0.9
params.var_i = 1.0
params.var_m = 8
params.var_n = 3
params.var_p = 0
params.var_q = 45
params.var_r = 144
params.var_v = 0.1
params.var_x = 8
params.var_z = 25
// indel parameters
params.indel_rb = 2
params.indel_t3 = 136
params.indel_t5 = 8
params.indel_z = 25
params.indel_v = params.var_v
params.indel_a = params.var_a
params.indel_c = params.var_c

I am noticing some strange results in the final $sample.vcf.gz output by the NanoSeq pipeline.
Certain variants appear in all 8 of my samples. When I inspect these variants in IGV using the $sample.opmk.bam as input I notice that there is strand bias, i.e. all of the alternate alleles appear on the reverse strand. There are also multiple alternative alleles visible on IGV at the locus (all appearing on the reverse strand) that do not appear in my $sample.vcf.gz because they probably don't meet 2x2 duplex criteria.

For one of my $sample.opkm.bam, the IGV breakdown at this locus looks like:
Total count: 1247
A : 39 (3%, 0+, 39- )
T : 25 (2%, 0+, 25- )
C : 268 (21%, 0+, 268- )
G : 915 (73%, 610+, 305- )
N : 0

And the $sample.vcf.gz from the NanoSeq pipeline looks like:
chr5 154834880 . G C . PASS TRI=ACT>G;TIMES_CALLED=6;TYPE=snv;DUPLEX_VAF=0.0535714;BAM_VAF=0.00502513;BAM_VAF_BQ10=0.0840108;DEPTH_NORM_FWD=591;DEPTH_NORM_REV=60;DEPTH_FWD=8;DEPTH_REV=6.5;DUPLEX_COV=112;BAM_MUT=3;BAM_COV=597;BAM_MUT_BQ10=62;BAM_COV_BQ10=738;RB=chr5:154834649:154834922:ACC:TGA,chr5:154834707:154834907:GTA:TCT,chr5:154834718:154834922:CTA:GGT,chr5:154834720:154834908:TTT:TAG,chr5:154834729:154834964:GCA:GCT,chr5:154834731:154834930:ATC:GTG;QPOS=42,27,42,28,84,50;DPLX_ASXS=117,118,116,116,118,116;DPLX_CLIP=0,0,0,0,0,0;DPLX_NM=1,1,1,1,1,1;BULK_ASXS=118,118,118,118,118,118;BULK_NM=0,0,0,0,0,0

Questions:

(1) Can you still get a 2x2 duplex call if all of the supporting reads are reverse orientation? Could that happen if both PCR duplicates on positive strand are reverse AND both PCR duplicates on the negative strand are reverse?

(2) Is there a filter in the NanoSeq pipeline that can help me filter out this kind of strand bias artefact? I see that there is
DEPTH_FWD="Read bundle forward reads depth"
DEPTH_REV="Read bundle reverse reads depth"
What do these mean exactly and are they used for filtering?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions