Skip to content

Variant not detected within 16bp from the start/end #257

Closed
@peterk87

Description

@peterk87

Hello,

I noticed that Clair3 is not calling a variant near the end of a short reference sequence (position 2265 of Influenza A virus PB2 sequence OP597571.1; sequence length=2280).

Bcftools calls at a G->A variant at position 2265. It's also clear from looking at the read alignment in IGV that there's a high AF variant at the position.

Clair3 merge_output.vcf.gz:

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=../run_clair3.sh --bam_fn=minimap2.bam --ref_fn=OP597571.1.fasta --model_path=.../models/r941_prom_sup_g5014 --print_ref_calls --gvcf --threads=1 --platform=ont --output=clair3-out --haploid_sensitive --enable_long_indel --keep_iupac_bases --no_phasing_for_fa --include_all_ctgs
##reference=OP597571.1.fasta
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=OP597571.1,length=2280>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      62      .       C       .       17.62   RefCall P       GT:GQ:DP:AD:AF:PL       0:17:828:641:0.7742:990
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1:9:143:2,107:0.7483:18,13,0
OP597571.1      232     .       T       .       21.36   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:102:75:0.7353:990
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1:25:90:1,84:0.9333:80,58,0
OP597571.1      495     .       G       .       19.81   RefCall P       GT:GQ:DP:AD:AF:PL       0:19:42:36:0.8571:990
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1:35:44:0,41:0.9318:80,63,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:45:0,44:0.9778:80,54,0
OP597571.1      639     .       C       .       22.44   RefCall P       GT:GQ:DP:AD:AF:PL       0:22:44:28:0.6364:990
OP597571.1      694     .       T       .       26.30   RefCall P       GT:GQ:DP:AD:AF:PL       0:26:43:37:0.8605:990
OP597571.1      708     .       A       .       23.57   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:43:33:0.7674:990
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1:24:47:4,40:0.8511:74,37,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:58:1,56:0.9655:80,42,0
OP597571.1      895     .       A       .       23.47   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:59:51:0.8644:990
OP597571.1      1113    .       A       .       21.30   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:45:40:0.8889:990
OP597571.1      1170    .       C       T       17.59   PASS    F       GT:GQ:DP:AD:AF:PL       1:17:41:25,12:0.2927:22,0,63
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1:28:43:0,40:0.9302:77,45,0
OP597571.1      1345    .       T       .       25.67   RefCall P       GT:GQ:DP:AD:AF:PL       0:25:44:30:0.6818:990
OP597571.1      1432    .       G       .       24.87   RefCall P       GT:GQ:DP:AD:AF:PL       0:24:52:43:0.8269:990
OP597571.1      1469    .       A       G       24.83   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:52:2,49:0.9423:80,37,0
OP597571.1      1504    .       T       .       22.74   RefCall P       GT:GQ:DP:AD:AF:PL       0:22:50:41:0.8200:990
OP597571.1      1522    .       A       .       26.06   RefCall P       GT:GQ:DP:AD:AF:PL       0:26:54:39:0.7222:990
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1:31:57:4,49:0.8596:80,52,0
OP597571.1      1736    .       C       .       23.60   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:59:41:0.6949:990
OP597571.1      1737    .       T       .       25.37   RefCall P       GT:GQ:DP:AD:AF:PL       0:25:59:43:0.7288:990
OP597571.1      1760    .       C       .       21.45   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:60:52:0.8667:990
OP597571.1      1761    .       T       .       18.29   RefCall P       GT:GQ:DP:AD:AF:PL       0:18:60:44:0.7333:990
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1:27:58:0,54:0.9310:80,47,0
OP597571.1      1942    .       C       .       21.32   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:87:73:0.8391:990
OP597571.1      1957    .       T       .       24.29   RefCall P       GT:GQ:DP:AD:AF:PL       0:24:96:72:0.7500:990
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:168:1,156:0.9286:80,41,0
OP597571.1      2154    .       A       .       32.13   RefCall F       GT:GQ:DP:AD:AF:PL       0:32:713:468:0.6564:990
OP597571.1      2216    .       G       .       21.01   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:834:715:0.8573:990
OP597571.1      2253    .       C       .       18.05   RefCall P       GT:GQ:DP:AD:AF:PL       0:18:767:569:0.7419:990

Bcftools mpileup and call output:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.17+htslib-1.17
##bcftoolsCommand=bcftools mpileup -d 100000 -f OP597571.1.fasta minimap2.bam
##contig=<ID=OP597571.1,length=2280>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.17+htslib-1.17
##bcftools_callCommand=call --ploidy 1 -mv -Ob -o calls.vcf; Date=Mon Dec 18 20:32:30 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  WIN-AH-2023-FAV-0375-3-1C2d.Segment_1_PB2.OP597571.1.bam
OP597571.1      185     .       G       A       228.391 .       DP=139;VDB=0.979122;SGB=-0.693147;RPBZ=-0.747496;MQBZ=-0.574548;MQSBZ=-0.192448;BQBZ=1.13137;SCBZ=-0.508121;MQ0F=0;AC=1;AN=1;DP4=1,1,29,40;MQ=56       GT:PL   1:255,0
OP597571.1      252     .       C       A       225.421 .       DP=102;VDB=0.747331;SGB=-0.693147;MQSBZ=-0.642482;MQ0F=0;AC=1;AN=1;DP4=0,0,27,46;MQ=55  GT:PL 1:255,0
OP597571.1      510     .       C       T       228.402 .       DP=42;VDB=0.0719096;SGB=-0.693145;RPBZ=-0.412677;MQBZ=0;MQSBZ=0;BQBZ=1.36377;SCBZ=0.323986;MQ0F=0;AC=1;AN=1;DP4=1,0,16,25;MQ=60        GT:PL   1:255,0
OP597571.1      528     .       A       C       211.417 .       DP=41;VDB=0.278679;SGB=-0.693145;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,19,22;MQ=60   GT:PL   1:241,0
OP597571.1      540     .       G       C       225.417 .       DP=44;VDB=0.219325;SGB=-0.693146;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,18,26;MQ=60   GT:PL   1:255,0
OP597571.1      741     .       AGGGG   AGG     17.0828 .       INDEL;IDV=18;IMF=0.367347;DP=49;VDB=0.73002;SGB=-0.683931;RPBZ=-0.778151;MQBZ=0.124761;MQSBZ=-0.650785;BQBZ=1.02751;SCBZ=0.358316;MQ0F=0;AC=1;AN=1;DP4=14,17,8,5;MQ=58 GT:PL   1:87,42
OP597571.1      747     .       A       G       149.305 .       DP=45;VDB=0.841016;SGB=-0.693145;RPBZ=1.48153;MQBZ=1.82034;MQSBZ=0;BQBZ=2.48105;SCBZ=0.918559;MQ0F=0;AC=1;AN=1;DP4=4,1,19,21;MQ=58     GT:PL   1:176,0
OP597571.1      888     .       T       C       228.415 .       DP=57;VDB=0.0352829;SGB=-0.693147;RPBZ=0.304058;MQBZ=-0.190662;MQSBZ=0.201778;BQBZ=1.7129;SCBZ=0.309711;MQ0F=0;AC=1;AN=1;DP4=1,0,24,32;MQ=58   GT:PL   1:255,0
OP597571.1      1335    .       G       A       228.424 .       DP=42;VDB=0.713753;SGB=-0.693145;RPBZ=0.0412661;MQBZ=-0.156174;MQSBZ=-0.866025;BQBZ=1.69535;SCBZ=0.481876;MQ0F=0;AC=1;AN=1;DP4=1,0,17,24;MQ=59 GT:PL   1:255,0
OP597571.1      1469    .       A       G       221.38  .       DP=52;VDB=0.389159;SGB=-0.693147;RPBZ=0.667379;MQBZ=-0.353341;MQSBZ=-1.08356;BQBZ=2.5377;SCBZ=-1.22412;MQ0F=0;AC=1;AN=1;DP4=2,1,17,32;MQ=59    GT:PL   1:248,0
OP597571.1      1566    .       G       A       228.393 .       DP=55;VDB=0.00601305;SGB=-0.693147;RPBZ=0.899577;MQBZ=-0.498471;MQSBZ=-1.09919;BQBZ=1.93931;SCBZ=-1.1883;MQ0F=0;AC=1;AN=1;DP4=1,1,19,34;MQ=56  GT:PL   1:255,0
OP597571.1      1677    .       T       C       203.339 .       DP=55;VDB=0.0325114;SGB=-0.693147;RPBZ=-0.145875;MQBZ=-0.280056;MQSBZ=-0.785905;BQBZ=1.93259;SCBZ=-0.687735;MQ0F=0;AC=1;AN=1;DP4=2,2,19,32;MQ=58       GT:PL   1:230,0
OP597571.1      1806    .       G       A       228.42  .       DP=55;VDB=0.0163287;SGB=-0.693147;RPBZ=0.346711;MQBZ=-0.136083;MQSBZ=-0.847791;BQBZ=1.67811;SCBZ=-1.38535;MQ0F=0;AC=1;AN=1;DP4=1,0,22,32;MQ=59 GT:PL   1:255,0
OP597571.1      2051    .       T       C       228.414 .       DP=185;VDB=0.258521;SGB=-0.693147;RPBZ=-0.659562;MQBZ=-0.811578;MQSBZ=-6.21909;BQBZ=2.58014;SCBZ=-0.128242;MQ0F=0;AC=1;AN=1;DP4=1,2,100,80;MQ=55       GT:PL   1:255,0
OP597571.1      2265    .       G       A       228.4   .       DP=802;VDB=0;SGB=-0.693147;RPBZ=-0.477099;MQBZ=2.06864;MQSBZ=3.44984;BQBZ=7.12028;SCBZ=0.475257;MQ0F=0;AC=1;AN=1;DP4=23,0,710,60;MQ=54 GT:PL   1:255,0

Clair3 command:

CLAIR_BIN_DIR=$(dirname $(which run_clair3.sh))
MODEL_PATH="$CLAIR_BIN_DIR/models/r941_prom_sup_g5014"

samtools faidx ref.fasta

run_clair3.sh \
    --bam_fn=minimap2.bam \
    --ref_fn=OP597571.1.fasta \
    --model_path="$MODEL_PATH"\
    --threads=1 \
    --platform="ont" \
    --output=clair3-out \
    --haploid_sensitive \
    --enable_long_indel \
    --keep_iupac_bases \
    --no_phasing_for_fa \
    --include_all_ctgs

Other details:

  • Clair3 version: 1.0.4 (installed from Bioconda)
  • read mapper: Minimap2 v2.24
  • instrument: GridION
  • instrument software: MinKNOW 23.07.12 (Bream 7.7.6, Core 5.7.5, Dorado 7.1.4+d7df870c0)
  • basecalling model: Super-accurate basecalling, 450 bps
  • flowcell: FLO-MIN106
  • kit: SQK-RBK110-96

Are there any parameters that need to be adjusted for variant calling of RNA virus sequence data?

Any help would be much appreciated!

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions