Skip to content

No DPLX_ASXS fields in muts.vcf.gz or index.vcf.gz #122

@katestan

Description

@katestan

When running the pipeline on 10 paired samples (nanoseq on tissue + standard WGS on blood) simultaneously, I am getting this error:

[f0/58e78d] process > MAP_FASTQ:ADD_NANOSEQ_FASTQ... [100%] 18 of 18, cached:...
[64/708770] process > MAP_FASTQ:BWAMEM2_MAP (P10_... [100%] 18 of 18, cached:...
[49/711355] process > MARKDUP (P19_normal)           [100%] 18 of 18, cached:...
[39/dd6358] process > NANOSEQ_ADD_RB (P16_normal)    [100%] 18 of 18 ✔
[c0/7980bc] process > NANOSEQ_DEDUP (P13_normal)     [100%] 18 of 18 ✔
[f7/6be082] process > VERIFY_BAMID (P13_normal)      [100%] 18 of 18 ✔
[45/8b0453] process > NANOSEQ_EFFI (P13_normal)      [100%] 18 of 18 ✔
[d5/0339b8] process > NANOSEQ:COV (P13)              [100%] 9 of 9 ✔
[1f/8de229] process > NANOSEQ:PART (P13)             [100%] 9 of 9 ✔
[71/816575] process > NANOSEQ:DSA (P13_66)           [100%] 808 of 808
[2a/8830bb] process > NANOSEQ:VAR (P12_88)           [100%] 782 of 782
[ee/92dec7] process > NANOSEQ:INDEL (P12_78)         [100%] 794 of 794
[91/3e48be] process > NANOSEQ:POST (P19)             [100%] 5 of 5
[0b/900b79] process > NANOSEQ_VAF (P10_pair)         [100%] 3 of 3, failed: 3...
[-        ] process > FINALIZE                       -
ERROR ~ Error executing process > 'NANOSEQ_VAF (P38_pair)'

Caused by:
  Process `NANOSEQ_VAF (P38_pair)` terminated with an error exit status (1)

Command executed:

  touch NANOSEQ_VAF_P38_pair
  mkdir -p out
  export REF_PATH='/nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/test/GRCh38/genome.fa'
  snv_merge_and_vaf_calc.R P38.muts.vcf.gz P38.indel.vcf.gz P38_duplex.neat.cram P38.cov.bed.gz out/P38.vcf
  bcftools sort -Oz out/P38.vcf -o P38.vcf.gz
  bcftools index -t P38.vcf.gz
  rm out/P38.vcf
  cat <<-END_VERSIONS > versions.yml
  "NANOSEQ_VAF":
      snv_merge_and_vaf_calc.R : $(runNanoSeq.py -v)
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error in mean.default(indels_tmp[, "qual"]) : 
    (converted from warning) argument is not numeric or logical: returning NA
  Calls: mean -> mean -> mean.default
  Execution halted

Work dir:
  /nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/64/79a1c53254d66e7774259ea8f0e1ab

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

When I open R and run snv_merge_and_vaf_calc.R iteratively, I notice that the INFO field of my muts.vcf.gz and indels.vcf.gz don't have "DPLX_ASXS", "DPLX_ASXS", "DPLX_CLIP", "DPLX_CLIP", "DPLX_NM","DPLX_NM","BULK_ASXS","BULK_ASXS","BULK_NM","BULK_NM". snv_merge_and_vaf_calc.R expects these fields and I think this is why it is failing.

My indel.vcf.gz for this paired sample looks like this:

##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##INFO=<ID=RB,Number=1,Type=String,Description="Readbundle ID">
##INFO=<ID=SEQ,Number=1,Type=String,Description="Sequence of indel plus flanking sequences">
##INFO=<ID=NN,Number=1,Type=String,Description="n indels / n bases">
##FILTER=<ID=NEI_IND,Description="Site was found in an indel rich region of the matched normal">
##FILTER=<ID=MISSINGBULK,Description="Site was not found in the matched normal">
##FILTER=<ID=MASKED,Description="Site overlaps with SW or SNP site">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of high-quality non-reference bases">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-fwd, ref-reverse, alt-fwd and alt-reverse bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.14+htslib-1.14
##bcftools_callCommand=call --ploidy 1 --skip-variants snps --multiallelic-caller --variants-only -O v
##bcftools_normVersion=1.14+htslib-1.14
##bcftools_normCommand=norm
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample_1
chr1    19635672        .       TTTG    T       106.415 PASS    INDEL;IDV=5;IMF=1;DP=5;VDB=0.00187095;SGB=-0.590765;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,5,0;MQ=60;RB=chr1,19635543,19635967,AAT,ATT;SEQ=TATTATTATTTGTATTTT    GT:PL:DP:DV:SP:DP4      1:136,0:5:5:0:0,0,5,0

I am unsure why this information isn't in my VCFs because it is present in my post/variants.csv:

chrom,chromStart,context,commonSNP,shearwater,bulkASXS,bulkNM,bulkForwardA,bulkForwardC,bulkForwardG,bulkForwardT,bulkForwardIndel,bulkReverseA,bulkReverseC,bulkReverseG,bulkReverseT,bulkReverseIndel,dplxBreakpointBeg,dplxBreakpointEnd,bundleType,dplxASXS,dplxCLIP,dplxNM,dplxfwdA,dplxfwdC,dplxfwdG,dplxfwdT,dplxfwdIndel,dplxrevA,dplxrevC,dplxrevG,dplxrevT,dplxrevIndel,dplxCQfwdA,dplxCQfwdC,dplxCQfwdG,dplxCQfwdT,dplxCQrevA,dplxCQrevC,dplxCQrevG,dplxCQrevT,bulkForwardTotal,bulkReverseTotal,dplxfwdTotal,dplxrevTotal,left,right,qpos,call,isvariant,pyrcontext,stdcontext,pyrsub,stdsub,ismasked,dplxBarcode
chr1,3410876,GGG,0,0,105,0,0,0,8,0,0,0,0,7,0,0,3410829,3411045,1,114,0,2,7,0,0,0,0,4,0,0,0,0,251,5,5,5,146,5,5,5,8,7,7,4,48,168,48,A,1,CCC,GGG,CCC>T,GGG>A,0,CCA|GAG
chr1,3410877,GGA,0,0,105,0,0,0,8,0,0,0,0,7,0,0,3410829,3411045,1,114,0,2,0,7,0,0,0,0,4,0,0,0,5,251,5,5,5,146,5,5,8,7,7,4,49,167,49,C,1,TCC,GGA,TCC>G,GGA>C,0,CCA|GAG
chr1,15885994,AAA,0,0,88,0,8,0,0,0,0,7,0,0,0,0,15885923,15886157,1,119,0,1,0,0,7,0,0,0,0,2,0,0,5,5,251,5,5,5,75,5,8,7,7,2,72,162,72,G,1,TTT,AAA,TTT>C,AAA>G,0,AAT|ATA
chr1,16629514,GCC,0,1,53,1,0,17,0,0,0,0,15,0,0,0,16629437,16629699,1,62,0,2.2,0,0,0,2,0,0,0,0,6,0,5,5,5,75,5,5,5,216,17,15,2,6,78,184,78,T,1,GCC,GCC,GCC>T,GCC>T,1,TAT|CCG
chr1,16629514,GCC,0,1,53,1,0,17,0,0,0,0,15,0,0,0,16629437,16629700,1,63,0,2.1,0,0,0,6,0,0,0,0,13,0,5,5,5,216,5,5,5,463,17,15,6,13,78,185,78,T,1,GCC,GCC,GCC>T,GCC>T,1,TCA|TTA
chr1,16629514,GCC,0,1,53,1,0,17,0,0,0,0,15,0,0,0,16629438,16629699,1,63,0,2,0,0,0,3,0,0,0,0,2,0,5,5,5,110,5,5,5,75,17,15,3,2,77,184,77,T,1,GCC,GCC,GCC>T,GCC>T,1,ATG|GCG

When I check the .nextflow.log I notice that NANOSEQ:VAR and NANOSEQ:INDEL are unable to find 'var/nfiles' and 'indel/nfiles'. Could this be related to issue 86 #86
See .nextflow.log here:

Jun-02 17:32:14.953 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process `NANOSEQ:VAR (P36_46)` is unable to find [UnixPath]: `/nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/6c/e41fcb3c398bb981dce880287a46f4/var/nfiles` (pattern: `var/nfiles`)
Jun-02 17:32:14.953 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process `NANOSEQ:VAR (P36_46)` is unable to find [UnixPath]: `/nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/6c/e41fcb3c398bb981dce880287a46f4/var/args.json` (pattern: `var/args.json`)
Jun-02 17:32:14.954 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process NANOSEQ:VAR > Skipping output binding because one or more optional files are missing: fileoutparam<4>
Jun-02 17:32:14.954 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process NANOSEQ:VAR > Skipping output binding because one or more optional files are missing: fileoutparam<5>
Jun-02 17:32:14.967 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 21976997; id: 1173; name: NANOSEQ:INDEL (P36_9); status: COMPLETED; exit: 0; error: -; workDir: /nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/4f/6b02309b0ad25ed9cbb90f1997e456 started: 1748881755095; exited: 2025-06-02T16:32:13.658763Z; ]
Jun-02 17:32:14.976 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process NANOSEQ:VAR (P36_52) > jobId: 21977114; workDir: /nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/c5/843a917db8b85d9d5064ee8c21ee50
Jun-02 17:32:14.976 [Task submitter] INFO  nextflow.Session - [c5/843a91] Submitted process > NANOSEQ:VAR (P36_52)
Jun-02 17:32:14.982 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process `NANOSEQ:INDEL (P36_9)` is unable to find [UnixPath]: `/nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/4f/6b02309b0ad25ed9cbb90f1997e456/indel/nfiles` (pattern: `indel/nfiles`)
Jun-02 17:32:14.982 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process `NANOSEQ:INDEL (P36_9)` is unable to find [UnixPath]: `/nemo/lab/rouhanif/home/users/stanlek/nf-ns-crlm/NanoSeq/Nextflow/work/4f/6b02309b0ad25ed9cbb90f1997e456/indel/args.json` (pattern: `indel/args.json`)
Jun-02 17:32:14.982 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process NANOSEQ:INDEL > Skipping output binding because one or more optional files are missing: fileoutparam<4>
Jun-02 17:32:14.982 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process NANOSEQ:INDEL > Skipping output binding because one or more optional files are missing: fileoutparam<5>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions