22
33## Input
44
5+ ### Sample config file
6+
7+ Sample identifier and their necessary filepaths (` bam ` , ` vcf ` , etc.) are provided to QuaC in a ` tsv ` formatted config
8+ file via ` --sample_config ` . Columns required depend on the flags supplied to ` src/run_quac.py ` . This table lists the
9+ allowed columns and when to use them.
10+
11+ | Column | When to use | Description |
12+ | --------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------- |
13+ | sample_id | Always | Sample identifier |
14+ | bam | Always | BAM filepath |
15+ | vcf | Always | VCF filepath |
16+ | capture_bed | ` --exome ` | Capture region bed filepath |
17+ | fastqc_raw | ` --include_prior_qc ` | Filepath to FastQC ` zip ` files created from raw fastqs. Use comma as delimiter if multiple files. |
18+ | fastqc_trimmed | ` --include_prior_qc ` | Filepath to FastQC ` zip ` files created from trimmed fastqs. Use comma as delimiter if multiple files. |
19+ | fastq_screen | ` --include_prior_qc ` | Filepath to FastQ Screen ` txt ` files. Use comma as delimiter if multiple files. |
20+ | dedup | ` --include_prior_qc ` | Filepath to Picard's MarkDuplicates ` txt ` files. Use comma as delimiter if multiple files. |
21+ | multiqc_rename_config | ` --allow_sample_renaming ` | Filepath to label rename configfile to use with multiqc |
22+
23+ Refer to our system testing directory for example sample config files at ` .test/configs ` . For example:
24+
25+ * ` .test/configs/no_priorQC/sample_config/project_2samples_wgs.tsv ` - Sample config file for WGS samples and no prior
26+ QC.
27+ * ` .test/configs/no_priorQC/sample_config/project_2samples_exome.tsv ` - Sample config file for exome samples and no
28+ prior QC. Note that WGS and exome samples can't be used in the same config file.
29+ * ` .test/configs/include_priorQC/sample_config/project_2samples_wgs.tsv ` - Sample config file for WGS samples with prior
30+ QC data available from [ certain QC tools] ( ./index.md#optional-qc-output-consumed-by-quac ) .
31+
32+ ### Pedigree file
33+
534<!-- markdown-link-check-disable -->
635
7- Samples belonging to a project are provided as input via ` --pedigree ` to QuaC in [ pedigree file
8- format ] ( https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format ) . Only the samples that are
9- supplied in pedigree file will be processed by QuaC and all of these samples must belong to the same project .
36+ QuaC requires a [ pedigree
37+ file ] ( https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format ) as input via ` --pedigree ` .
38+ Samples listed in this file must correspond to those in sample config file ( ` --sample_config ` ) .
1039
1140<!-- markdown-link-check-enable -->
1241
@@ -16,102 +45,23 @@ supplied in pedigree file will be processed by QuaC and all of these samples mus
1645 create a dummy pedigree file, which will lack sex (unless project tracking sheet is provided), relatedness and
1746 affected status info. See header of the script for usage instructions.
1847
19-
20- Each sample must have ` BAM ` and ` VCF ` files available in the directory structure shown below for sample ` X ` .
21-
22- ```
23- test_project/
24- └── analysis
25- ├── X
26- │ ├── bam
27- │ │ ├── X.bam
28- │ │ └── X.bam.bai
29- │ └── vcf
30- │ ├── X.vcf.gz
31- │ └── X.vcf.gz.tbi
32- └── Y
33- └── ....
34- ```
35-
36- When run in exome mode using flag ` --exome ` , QuaC requires a capture-regions bed file at the path
37- ` path_to_sample/configs/small_variant_caller/<capture_regions>.bed ` for each sample.
38-
39- ```
40- test_project/
41- └── analysis
42- ├── X
43- │ ├── bam
44- │ │ ├── X.bam
45- │ │ └── X.bam.bai
46- │ ├── configs
47- │ │ └── small_variant_caller
48- │ │ └── capture_regions.bed
49- │ └── vcf
50- │ ├── X.vcf.gz
51- │ └── X.vcf.gz.tbi
52- └── Y
53- └── ....
54- ```
55-
56- * Optionally* , QuaC can also utilize QC results produced by [ certain
57- tools] ( ./index.md#optional-qc-output-consumed-by-quac ) when run with flag ` --include_prior_qc ` . In this case, following
58- directory structure is expected.
59-
60- ```
61- test_project/
62- └── analysis
63- ├── X
64- │ ├── bam
65- │ │ ├── X.bam
66- │ │ └── X.bam.bai
67- │ ├── qc
68- │ │ ├── dedup
69- │ │ │ ├── X-1.metrics.txt
70- │ │ │ └── X-2.metrics.txt
71- │ │ ├── fastqc-raw
72- │ │ │ ├── ....
73- │ │ ├── fastqc-trimmed
74- │ │ │ ├── ....
75- │ │ ├── fastq_screen-trimmed
76- │ │ │ └── ....
77- │ │ └── multiqc_initial_pass <--- needed only when `--allow_sample_renaming` flag is used
78- │ │ └── multiqc_sample_rename_config
79- │ │ └── X_rename_config.tsv
80- │ └── vcf
81- │ ├── X.vcf.gz
82- │ └── X.vcf.gz.tbi
83- └── Y
84- └── ....
85- ```
86-
87-
88- !!! note "CGDS users only"
89-
90- Output (bam, vcf and QC output) produced by CGDS's small variant caller pipeline can be readily used as input to
91- QuaC with flags `--include_prior_qc` and `--allow_sample_renaming`.
92-
93- ### Example project structure
94-
95- Refer to system testing directory ` .test/ ` in the repo for an example project to see an example project with above
96- mentioned directory structure needed as input. In this setup, projects A and B have prior QC data included, whereas
97- samples C and D do not have them. Refer to pedigree files under ` .test/configs/ ` on how these example samples were used
98- as input to QuaC.
99-
100-
10148## Output
10249
10350QuaC results are stored at the path specified via option ` --outdir ` (default:
104- ` data/quac/results/test_project/analysis ` ). Refer to the [ system testing's
105- output] ( ./system_testing.md#expected-output-files ) to learn more about the output directory structure.
51+ ` data/quac/results/test_project/analysis ` ). Refer to the [ system testing's
52+ output] ( ./system_testing.md#expected-output-files ) to learn more about the output directory structure.
53+
54+ QC output are stored at the sample level as well as the project level (ie. all samples considered together) depending on
55+ the type of QC run. For example, Qualimap tool is run at the sample level whereas Somalier tool is run at the project
56+ level. MultiQC reports are available both at the sample and project level.
10657
107- !!! tip
58+ !!! tip
10859
109- Users may primarily be interested in the aggregated QC results produced by [multiqc ](https://multiqc.info/),
60+ Users may primarily be interested in the aggregated QC results produced by [MultiQC ](https://multiqc.info/),
11061 both at sample-level as well as at the project-level. These multiqc reports also include summary of QuaC-Watch
11162 results at the top.
11263
11364!!! note "CGDS users only"
11465
11566 QuaC's output directory structure was designed based on the output structure of the [CGDS small variant caller
11667 pipeline](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline).
117-
0 commit comments