Closed
Description
Hello,
I noticed that Clair3 is not calling a variant near the end of a short reference sequence (position 2265 of Influenza A virus PB2 sequence OP597571.1; sequence length=2280).
Bcftools calls at a G->A variant at position 2265. It's also clear from looking at the read alignment in IGV that there's a high AF variant at the position.
Clair3 merge_output.vcf.gz
:
##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=../run_clair3.sh --bam_fn=minimap2.bam --ref_fn=OP597571.1.fasta --model_path=.../models/r941_prom_sup_g5014 --print_ref_calls --gvcf --threads=1 --platform=ont --output=clair3-out --haploid_sensitive --enable_long_indel --keep_iupac_bases --no_phasing_for_fa --include_all_ctgs
##reference=OP597571.1.fasta
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=OP597571.1,length=2280>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
OP597571.1 62 . C . 17.62 RefCall P GT:GQ:DP:AD:AF:PL 0:17:828:641:0.7742:990
OP597571.1 185 . G A 9.65 PASS F GT:GQ:DP:AD:AF:PL 1:9:143:2,107:0.7483:18,13,0
OP597571.1 232 . T . 21.36 RefCall P GT:GQ:DP:AD:AF:PL 0:21:102:75:0.7353:990
OP597571.1 252 . C A 25.12 PASS P GT:GQ:DP:AD:AF:PL 1:25:90:1,84:0.9333:80,58,0
OP597571.1 495 . G . 19.81 RefCall P GT:GQ:DP:AD:AF:PL 0:19:42:36:0.8571:990
OP597571.1 510 . C T 33.42 PASS F GT:GQ:DP:AD:AF:PL 1:33:42:0,41:0.9762:79,57,0
OP597571.1 528 . A C 35.38 PASS F GT:GQ:DP:AD:AF:PL 1:35:44:0,41:0.9318:80,63,0
OP597571.1 540 . G C 26.34 PASS P GT:GQ:DP:AD:AF:PL 1:26:45:0,44:0.9778:80,54,0
OP597571.1 639 . C . 22.44 RefCall P GT:GQ:DP:AD:AF:PL 0:22:44:28:0.6364:990
OP597571.1 694 . T . 26.30 RefCall P GT:GQ:DP:AD:AF:PL 0:26:43:37:0.8605:990
OP597571.1 708 . A . 23.57 RefCall P GT:GQ:DP:AD:AF:PL 0:23:43:33:0.7674:990
OP597571.1 741 . A . 23.27 RefCall P GT:GQ:DP:AD:AF:PL 0:23:49:30:0.6122:990
OP597571.1 747 . A G 24.68 PASS F GT:GQ:DP:AD:AF:PL 1:24:47:4,40:0.8511:74,37,0
OP597571.1 888 . T C 24.92 PASS P GT:GQ:DP:AD:AF:PL 1:24:58:1,56:0.9655:80,42,0
OP597571.1 895 . A . 23.47 RefCall P GT:GQ:DP:AD:AF:PL 0:23:59:51:0.8644:990
OP597571.1 1113 . A . 21.30 RefCall P GT:GQ:DP:AD:AF:PL 0:21:45:40:0.8889:990
OP597571.1 1170 . C T 17.59 PASS F GT:GQ:DP:AD:AF:PL 1:17:41:25,12:0.2927:22,0,63
OP597571.1 1335 . G A 28.63 PASS F GT:GQ:DP:AD:AF:PL 1:28:43:0,40:0.9302:77,45,0
OP597571.1 1345 . T . 25.67 RefCall P GT:GQ:DP:AD:AF:PL 0:25:44:30:0.6818:990
OP597571.1 1432 . G . 24.87 RefCall P GT:GQ:DP:AD:AF:PL 0:24:52:43:0.8269:990
OP597571.1 1469 . A G 24.83 PASS P GT:GQ:DP:AD:AF:PL 1:24:52:2,49:0.9423:80,37,0
OP597571.1 1504 . T . 22.74 RefCall P GT:GQ:DP:AD:AF:PL 0:22:50:41:0.8200:990
OP597571.1 1522 . A . 26.06 RefCall P GT:GQ:DP:AD:AF:PL 0:26:54:39:0.7222:990
OP597571.1 1566 . G A 32.47 PASS F GT:GQ:DP:AD:AF:PL 1:32:54:2,52:0.9630:80,53,0
OP597571.1 1677 . T C 31.74 PASS F GT:GQ:DP:AD:AF:PL 1:31:57:4,49:0.8596:80,52,0
OP597571.1 1736 . C . 23.60 RefCall P GT:GQ:DP:AD:AF:PL 0:23:59:41:0.6949:990
OP597571.1 1737 . T . 25.37 RefCall P GT:GQ:DP:AD:AF:PL 0:25:59:43:0.7288:990
OP597571.1 1760 . C . 21.45 RefCall P GT:GQ:DP:AD:AF:PL 0:21:60:52:0.8667:990
OP597571.1 1761 . T . 18.29 RefCall P GT:GQ:DP:AD:AF:PL 0:18:60:44:0.7333:990
OP597571.1 1806 . G A 27.10 PASS P GT:GQ:DP:AD:AF:PL 1:27:58:0,54:0.9310:80,47,0
OP597571.1 1942 . C . 21.32 RefCall P GT:GQ:DP:AD:AF:PL 0:21:87:73:0.8391:990
OP597571.1 1957 . T . 24.29 RefCall P GT:GQ:DP:AD:AF:PL 0:24:96:72:0.7500:990
OP597571.1 2051 . T C 26.04 PASS P GT:GQ:DP:AD:AF:PL 1:26:168:1,156:0.9286:80,41,0
OP597571.1 2154 . A . 32.13 RefCall F GT:GQ:DP:AD:AF:PL 0:32:713:468:0.6564:990
OP597571.1 2216 . G . 21.01 RefCall P GT:GQ:DP:AD:AF:PL 0:21:834:715:0.8573:990
OP597571.1 2253 . C . 18.05 RefCall P GT:GQ:DP:AD:AF:PL 0:18:767:569:0.7419:990
Bcftools mpileup and call output:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.17+htslib-1.17
##bcftoolsCommand=bcftools mpileup -d 100000 -f OP597571.1.fasta minimap2.bam
##contig=<ID=OP597571.1,length=2280>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.17+htslib-1.17
##bcftools_callCommand=call --ploidy 1 -mv -Ob -o calls.vcf; Date=Mon Dec 18 20:32:30 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT WIN-AH-2023-FAV-0375-3-1C2d.Segment_1_PB2.OP597571.1.bam
OP597571.1 185 . G A 228.391 . DP=139;VDB=0.979122;SGB=-0.693147;RPBZ=-0.747496;MQBZ=-0.574548;MQSBZ=-0.192448;BQBZ=1.13137;SCBZ=-0.508121;MQ0F=0;AC=1;AN=1;DP4=1,1,29,40;MQ=56 GT:PL 1:255,0
OP597571.1 252 . C A 225.421 . DP=102;VDB=0.747331;SGB=-0.693147;MQSBZ=-0.642482;MQ0F=0;AC=1;AN=1;DP4=0,0,27,46;MQ=55 GT:PL 1:255,0
OP597571.1 510 . C T 228.402 . DP=42;VDB=0.0719096;SGB=-0.693145;RPBZ=-0.412677;MQBZ=0;MQSBZ=0;BQBZ=1.36377;SCBZ=0.323986;MQ0F=0;AC=1;AN=1;DP4=1,0,16,25;MQ=60 GT:PL 1:255,0
OP597571.1 528 . A C 211.417 . DP=41;VDB=0.278679;SGB=-0.693145;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,19,22;MQ=60 GT:PL 1:241,0
OP597571.1 540 . G C 225.417 . DP=44;VDB=0.219325;SGB=-0.693146;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,18,26;MQ=60 GT:PL 1:255,0
OP597571.1 741 . AGGGG AGG 17.0828 . INDEL;IDV=18;IMF=0.367347;DP=49;VDB=0.73002;SGB=-0.683931;RPBZ=-0.778151;MQBZ=0.124761;MQSBZ=-0.650785;BQBZ=1.02751;SCBZ=0.358316;MQ0F=0;AC=1;AN=1;DP4=14,17,8,5;MQ=58 GT:PL 1:87,42
OP597571.1 747 . A G 149.305 . DP=45;VDB=0.841016;SGB=-0.693145;RPBZ=1.48153;MQBZ=1.82034;MQSBZ=0;BQBZ=2.48105;SCBZ=0.918559;MQ0F=0;AC=1;AN=1;DP4=4,1,19,21;MQ=58 GT:PL 1:176,0
OP597571.1 888 . T C 228.415 . DP=57;VDB=0.0352829;SGB=-0.693147;RPBZ=0.304058;MQBZ=-0.190662;MQSBZ=0.201778;BQBZ=1.7129;SCBZ=0.309711;MQ0F=0;AC=1;AN=1;DP4=1,0,24,32;MQ=58 GT:PL 1:255,0
OP597571.1 1335 . G A 228.424 . DP=42;VDB=0.713753;SGB=-0.693145;RPBZ=0.0412661;MQBZ=-0.156174;MQSBZ=-0.866025;BQBZ=1.69535;SCBZ=0.481876;MQ0F=0;AC=1;AN=1;DP4=1,0,17,24;MQ=59 GT:PL 1:255,0
OP597571.1 1469 . A G 221.38 . DP=52;VDB=0.389159;SGB=-0.693147;RPBZ=0.667379;MQBZ=-0.353341;MQSBZ=-1.08356;BQBZ=2.5377;SCBZ=-1.22412;MQ0F=0;AC=1;AN=1;DP4=2,1,17,32;MQ=59 GT:PL 1:248,0
OP597571.1 1566 . G A 228.393 . DP=55;VDB=0.00601305;SGB=-0.693147;RPBZ=0.899577;MQBZ=-0.498471;MQSBZ=-1.09919;BQBZ=1.93931;SCBZ=-1.1883;MQ0F=0;AC=1;AN=1;DP4=1,1,19,34;MQ=56 GT:PL 1:255,0
OP597571.1 1677 . T C 203.339 . DP=55;VDB=0.0325114;SGB=-0.693147;RPBZ=-0.145875;MQBZ=-0.280056;MQSBZ=-0.785905;BQBZ=1.93259;SCBZ=-0.687735;MQ0F=0;AC=1;AN=1;DP4=2,2,19,32;MQ=58 GT:PL 1:230,0
OP597571.1 1806 . G A 228.42 . DP=55;VDB=0.0163287;SGB=-0.693147;RPBZ=0.346711;MQBZ=-0.136083;MQSBZ=-0.847791;BQBZ=1.67811;SCBZ=-1.38535;MQ0F=0;AC=1;AN=1;DP4=1,0,22,32;MQ=59 GT:PL 1:255,0
OP597571.1 2051 . T C 228.414 . DP=185;VDB=0.258521;SGB=-0.693147;RPBZ=-0.659562;MQBZ=-0.811578;MQSBZ=-6.21909;BQBZ=2.58014;SCBZ=-0.128242;MQ0F=0;AC=1;AN=1;DP4=1,2,100,80;MQ=55 GT:PL 1:255,0
OP597571.1 2265 . G A 228.4 . DP=802;VDB=0;SGB=-0.693147;RPBZ=-0.477099;MQBZ=2.06864;MQSBZ=3.44984;BQBZ=7.12028;SCBZ=0.475257;MQ0F=0;AC=1;AN=1;DP4=23,0,710,60;MQ=54 GT:PL 1:255,0
Clair3 command:
CLAIR_BIN_DIR=$(dirname $(which run_clair3.sh))
MODEL_PATH="$CLAIR_BIN_DIR/models/r941_prom_sup_g5014"
samtools faidx ref.fasta
run_clair3.sh \
--bam_fn=minimap2.bam \
--ref_fn=OP597571.1.fasta \
--model_path="$MODEL_PATH"\
--threads=1 \
--platform="ont" \
--output=clair3-out \
--haploid_sensitive \
--enable_long_indel \
--keep_iupac_bases \
--no_phasing_for_fa \
--include_all_ctgs
Other details:
- Clair3 version: 1.0.4 (installed from Bioconda)
- read mapper: Minimap2 v2.24
- instrument: GridION
- instrument software: MinKNOW 23.07.12 (Bream 7.7.6, Core 5.7.5, Dorado 7.1.4+d7df870c0)
- basecalling model: Super-accurate basecalling, 450 bps
- flowcell: FLO-MIN106
- kit: SQK-RBK110-96
Are there any parameters that need to be adjusted for variant calling of RNA virus sequence data?
Any help would be much appreciated!
Thanks!