Skip to content

Support for larger genomes #239

@fmobegi

Description

@fmobegi

Is your feature request related to a problem? Please describe

Am working on in-planta infection RNAseq assays for the Lentil-Ascochyta pathosystem. The pathogen side of things works perfectly with the pipeline as currently constituted (albeit with a few hiccups on the cleanup of large intermediate files as mentioned in a separate issue report). However, the analysis cannot go past samtools indexing when using the Lentil genome as a reference.

Describe the solution you'd like

Am not entirely sure how to handle this (perhaps a try-catch-except), but, if the tool could do some kind of chromosome length check and send the BAM files to appropriate samtools indexing and generate either the default *.BAI or *.CSI for larger genomes.

Describe alternatives you've considered

I have modified the process "samtools_index" as below to add the -c flag thus enabling csi index for this run.

process samtools_index {
  publishDir "${params.outdir}/Samples/${sample_id}", mode: params.publish_dir_mode, pattern: publish_pattern_samtools_index
  tag { sample_id }
  label "samtools"

  input:
    set val(sample_id), file(bam_file) from SORTED_FOR_INDEX

  output:
    set val(sample_id), file(bam_file) into BAM_INDEXED_FOR_STRINGTIE
    set val(sample_id), file("*.bam.csi") into BAI_INDEXED_FILE
//    set val(sample_id), file("*.bam.bai") into BAI_INDEXED_FILE
    set val(sample_id), file("*.bam.log") into BAM_INDEXED_LOG

  script:
  """
  echo "#TRACE sample_id=${sample_id}"
  echo "#TRACE bam_bytes=`stat -Lc '%s' *.bam`"

//  samtools index ${bam_file}
  samtools index -c ${bam_file}
  samtools stats ${bam_file} > ${sample_id}.bam.log
  """
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions