Documentation for SCOUT cohort

Workflow

For every SPN, the following steps are requested:

ProCESS simulation: perform simulation of tumour growth, sampling and mutation engine set up according to genomics information reported in SCOUT and the instruction in the SCOUT section;
sequencing simulation: perform ProCESS sequencing of all 12 coverage-purity combinations (+ normal sample) following the step reporting in the build cohort section
generate report for each coverage-purity combination according to report generation section;
run nf-core/sarek for each coverage-purity combination. In particular:

4.1 Mapping and preprocessing of normal sample;

4.2 Mapping and prepocessing of coverage-purity combination (considered as different runs)

4.3 Variant calling of coverage-purity combinations
run nf-core/tumourevo on nf-core/sarek results.
perform validation of nf-core/sarek results by comparing:

6.1 Variant Allele Frequency of called somatic mutations (Strelka, Mutect2) vs Variant Allele Frequency of ProCESS ground truth mutations;

6.2 Variant Allele Frequency of called germline mutations (Haplotypecaller) vs Variant Allele Frequency of ProCESS ground truth mutations;

6.3 Purity and ploidy estimates from ASCAT given the set purity;

6.4 Segments and karyotypes from ASCAT vs phylo_forest$get_bulk_allelic_fragmentation of ProCESS ground truth CNAs.
perform validation of nf-core/tumourevo results by comparing:

7.1 Driver mutations;

7.2 Clonal and subclonal clusters given the set ProCESS samples composition;

7.3 Signature exposure vs phylo_forest$get_exposures()
perform plots for resource usage.

Summary table

step	sub-step	main script	output file	expected n of files
🟢 races simulation	simulate tissue	`0_sim_tissue.R`	sample_forest.sff snapshot	2
-	set mutation engine	`1_mut_engine.R`	phylo_forest.sff cna.rds	1+n
🔴 build cohort	sequencing tumour	`build_cohort.py`	purity_{}/../seq_results_SPN{}_t{}.rds	3x40
-	sequencing normal	-	purity_1/../seq_results_SPN{}_n{}.rds../	1x6
-	merginig tumour	-	purity_{}/../seq_results_merged_SPN{}_{}x.rds	3x4
-	merginig normal	-	purity_1/seq_results_merged_SPN{}_30x.rds	1
-	fastq tumour generation	-	purity_{}/t{}_Sample_1.{R1,R2}.fastq.gz	3x40x2xn
-	fastq normal generation	-	purity_1/n{}_normal_sample.{R1,R2}.fastq.gz	6x2
🔴 sarek	mapping normal sample	`sarek_mapping_normal.sh`	normal_sample.recal.cram	1
-	mapping tumour samples	`sarek_mapping_{}x_{}p.sh`	{}x_{}p/SPN{}_S{}.recal.cram	12xn
-	mapping tumour samples	`sarek_mapping_{}x_{}p.sh`	{}x_{}p/SPN{}_S{}.recal.cram	12xn
-	variant calling strelka tumour samples	`sarek_variant_calling_{}x_{}p.sh`	{}x_{}p/SPN{}_S{}.vcf	12xn
-	variant calling mutect2 tumour samples	`sarek_variant_calling_{}x_{}p.sh`	{}x_{}p/SPN{}_S{}.vcf	12
-	variant calling haplotypecaller tumour samples	`sarek_mapping_normal.sh`	SPN{}_normal_sample.vcf	1
-	variant calling ascat tumour samples	`sarek_variant_calling_{}x_{}p.sh`	{}x_{}p/SPN{}_S{}.txt	12xn
🟠 tumourevo	drivers	``	{}x_{}p/SPN{}_S{}.rds	12x2xn
-	subclonal	``	{}x_{}p/SPN{}_S{}.rds	12x2xn
-	signature	``	{}x_{}p/SPN{}.rds	12x2

Note

n referes to the number of samples of each SPN

🟢: low resource usage

🟠: medium resource usage

🔴: high resource usage

Folder structure

For this project we will have a data folder where all the resulting files will be stored and a copy of the remote repository. This is the expected structure of the data folder for each SPN:

SCOUT/SPN{id}
    ├── races
    |   ├── sample_forest.sff
    |   ├── phylo_forest.sff
    |   ├── SPN{id}
    |   └── cna_data
    |       └── <sample>_cna.rds            
    ├── sarek
    |   ├── normal_sample
    |   |    ├── multiqc
    |   |    ├── pipeline_info
    |   |    ├── preprocessing
    |   |    │   ├── markduplicates
    |   |    │   |   └── <sample>
    |   |    │   ├── recal_table
    |   |    │   |   └── <sample>
    |   |    │   └── recalibrated
    |   |    │       └── <sample>
    |   |    └── variant_calling
    |   |        └── haplotypecaller
    |   |            └── <sample>                  
    |   └── {coverage}x_{purity}p
    |            ├── multiqc
    |            ├── pipeline_info
    |            ├── preprocessing
    |            │   ├── markduplicates
    |            │   |   └── <sample>
    |            │   ├── recal_table
    |            │   |   └── <sample>
    |            │   └── recalibrated
    |            │       └── <sample>
    |            └── variant_calling
    |                ├── mutect2
    |                |   └── <patient>
    |                └── strelka
    |                   └── <sample>
    ├──tumourevo
    |   └── {coverage}x_{purity}p
    |       └── {variant_caller}_ascat
    |           ├── variant_annotation
    |           |   └── <sample>
    |           ├── signature_deconvolution
    |           |   ├── SparseSignatures
    |           |   └── SigProfiler
    |           └── subclonal_deconvolution
    |               ├── viber
    |               ├── ctree
    |               ├── pyclone                    
    |               └── mobster
    ├── validation
    |   └── {coverage}x_{purity}p
    |       ├── sarek
    |       |   ├── somatic_mutations
    |       │   |   ├── strelka
    |       │   |   └── mutect2
    |       |   ├── germline_mutations
    |       │   |   └── haplotypecaller
    |       │   └── copy_number
    |       │       └── ascat
    |       └── tumourevo
    |           └── {variant_caller}_ascat
    |               ├── drivers
    |               ├── signature_deconvolution
    |               └── subclonal_deconvolution
    ├── report
    |   └── Report_SPN{id}_{coverage}x_{purity}p.html
    └── resources
        └──

Name		Name	Last commit message	Last commit date
Latest commit History 592 Commits
SCOUT		SCOUT
build_cohorts		build_cohorts
getters		getters
nf-validation		nf-validation
other		other
references		references
report		report
sequenza		sequenza
validation		validation
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
SCOUT_workflow.png		SCOUT_workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Documentation for SCOUT cohort

Workflow

Summary table

Folder structure

About

Uh oh!

Releases

Packages

Contributors 13

Uh oh!

Languages

caravagnalab/ProCESS-examples

Folders and files

Latest commit

History

Repository files navigation

Documentation for SCOUT cohort

Workflow

Summary table

Folder structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Uh oh!

Languages

Packages