Skip to content

SCOUT scripts can only be executed on Orfeo #3

@albertocasagrande

Description

@albertocasagrande

All SCOUT scripts adopt hardcoded absolute paths that depend on the machine used to produce the data, i.e., Orfeo.

For instance, SPN01 simulate_mutations.R script contains the following lines.

outdir <- "/orfeo/scratch/cdslab/shared/SCOUT/SPN01/races/"
forest <- load_samples_forest(paste0(outdir,"sample_forest.sff"))


setwd("/orfeo/cephfs/scratch/cdslab/shared/ProCESS/GRCh38")
m_engine <- MutationEngine(setup_code = "GRCh38",tumour_type = "COAD",
                           tumour_study = "US")

This approach works on Orfeo, but it is not general at all and prevents using SCOUT scripts on different machines or directory configurations. This is a major issue because data reproducibility is one of the main goal of SCOUT.

I suggest removing the absolute paths and exclusively accessing the subdirectories of the working directory. For instance, the above line would become

outdir <- "output"
forest <- load_samples_forest(file.path(outdir, "sample_forest.sff"))

m_engine <- MutationEngine(setup_code = "GRCh38",tumour_type = "COAD",
                           tumour_study = "US")

If using subdirectories is not optimal, for instance because you want to share the same mutation engine directory, you can either

  • define the subdirectories "output" and "GRCh38" as symbolic links by executing the following command-line lines
     ln -s  /orfeo/scratch/cdslab/shared/SCOUT/SPN01/races/ output
     ln -s /orfeo/cephfs/scratch/cdslab/shared/ProCESS/GRCh38/GRCh38 GRCh38
    
  • add two parameters to the SCOUT scripts to explicity get output and mutation engine directories. In this case, the original code snippet could become:
     args <- commandArgs(trailingOnly = TRUE)
     if (length(args) != 2) {
         args <- commandArgs(trailingOnly = FALSE)
         script_path <- sub("^--file=", "", args[grep("^--file=", args)])
     
         stop(paste("Syntax error: Rscript", basename(script_path), "<output_dir> <mutation_engine_directory>"))
     }
     
     output <- args[1]
     mu_dir <- args[1]
     forest <- load_samples_forest(paste0(outdir,"sample_forest.sff"))
    
     setwd(mu_dir)
     m_engine <- MutationEngine(setup_code = "GRCh38",tumour_type = "COAD",
                            tumour_study = "US")
    

Using paste0() to join paths should also be deprecated in favor of file.path().

Metadata

Metadata

Labels

enhancementNew feature or requestinvalidThis doesn't seem right

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions