Skip to content

Commit 5dd211c

Browse files
authored
Merge pull request #92 from uab-cgds-worthey/joss_manuscript
Bring master branch up to date on sample config as input and doc fixes
2 parents 667b786 + 3a38e46 commit 5dd211c

25 files changed

+358
-273
lines changed

.test/configs/include_priorQC/project_1sample.ped renamed to .test/configs/include_priorQC/pedigree/project_1sample.ped

File renamed without changes.

.test/configs/include_priorQC/project_2samples.ped renamed to .test/configs/include_priorQC/pedigree/project_2samples.ped

File renamed without changes.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
sample_id bam vcf capture_bed fastqc_raw fastqc_trimmed fastq_screen dedup multiqc_rename_config
2+
A .test/ngs-data/test_project/analysis/A/bam/A.bam .test/ngs-data/test_project/analysis/A/vcf/A.vcf.gz .test/ngs-data/test_project/analysis/A/configs/small_variant_caller/capture_regions.bed .test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-1-R1_screen.txt,.test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-1-R2_screen.txt,.test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-2-R1_screen.txt,.test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-2-R2_screen.txt .test/ngs-data/test_project/analysis/A/qc/dedup/A-1.metrics.txt,.test/ngs-data/test_project/analysis/A/qc/dedup/A-2.metrics.txt .test/ngs-data/test_project/analysis/A/qc/multiqc_initial_pass/multiqc_sample_rename_config/A_rename_config.tsv
3+
B .test/ngs-data/test_project/analysis/B/bam/B.bam .test/ngs-data/test_project/analysis/B/vcf/B.vcf.gz .test/ngs-data/test_project/analysis/B/configs/small_variant_caller/capture_regions.bed .test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-1-R1_screen.txt,.test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-1-R2_screen.txt,.test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-2-R1_screen.txt,.test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-2-R2_screen.txt .test/ngs-data/test_project/analysis/B/qc/dedup/B-1.metrics.txt,.test/ngs-data/test_project/analysis/B/qc/dedup/B-2.metrics.txt .test/ngs-data/test_project/analysis/B/qc/multiqc_initial_pass/multiqc_sample_rename_config/B_rename_config.tsv
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
sample_id bam vcf fastqc_raw fastqc_trimmed fastq_screen dedup multiqc_rename_config
2+
A .test/ngs-data/test_project/analysis/A/bam/A.bam .test/ngs-data/test_project/analysis/A/vcf/A.vcf.gz .test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-raw/A-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/A/qc/fastqc-trimmed/A-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-1-R1_screen.txt,.test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-1-R2_screen.txt,.test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-2-R1_screen.txt,.test/ngs-data/test_project/analysis/A/qc/fastq_screen-trimmed/A-2-R2_screen.txt .test/ngs-data/test_project/analysis/A/qc/dedup/A-1.metrics.txt,.test/ngs-data/test_project/analysis/A/qc/dedup/A-2.metrics.txt .test/ngs-data/test_project/analysis/A/qc/multiqc_initial_pass/multiqc_sample_rename_config/A_rename_config.tsv
3+
B .test/ngs-data/test_project/analysis/B/bam/B.bam .test/ngs-data/test_project/analysis/B/vcf/B.vcf.gz .test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-raw/B-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-1-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-1-R2_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-2-R1_fastqc.zip,.test/ngs-data/test_project/analysis/B/qc/fastqc-trimmed/B-2-R2_fastqc.zip .test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-1-R1_screen.txt,.test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-1-R2_screen.txt,.test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-2-R1_screen.txt,.test/ngs-data/test_project/analysis/B/qc/fastq_screen-trimmed/B-2-R2_screen.txt .test/ngs-data/test_project/analysis/B/qc/dedup/B-1.metrics.txt,.test/ngs-data/test_project/analysis/B/qc/dedup/B-2.metrics.txt .test/ngs-data/test_project/analysis/B/qc/multiqc_initial_pass/multiqc_sample_rename_config/B_rename_config.tsv
File renamed without changes.
File renamed without changes.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
sample_id bam vcf capture_bed
2+
C .test/ngs-data/test_project/analysis/C/bam/C.bam .test/ngs-data/test_project/analysis/C/vcf/C.vcf.gz .test/ngs-data/test_project/analysis/C/configs/small_variant_caller/capture_regions.bed
3+
D .test/ngs-data/test_project/analysis/D/bam/D.bam .test/ngs-data/test_project/analysis/D/vcf/D.vcf.gz .test/ngs-data/test_project/analysis/D/configs/small_variant_caller/capture_regions.bed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
sample_id bam vcf
2+
C .test/ngs-data/test_project/analysis/C/bam/C.bam .test/ngs-data/test_project/analysis/C/vcf/C.vcf.gz
3+
D .test/ngs-data/test_project/analysis/D/bam/D.bam .test/ngs-data/test_project/analysis/D/vcf/D.vcf.gz

docs/Changelog.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,44 @@ YYYY-MM-DD John Doe
1212
```
1313
---
1414

15+
2023-10-09 Manavalan Gajapathy
16+
17+
* Merges `joss_manuscript` to the `master` branch to bring it up to date.
18+
19+
2023-10-06 Manavalan Gajapathy
20+
21+
* Adds documentation on providing sample filepaths via user-provided sample config file due to recent PRs #87, #88, #89
22+
and #90 (closes #86).
23+
* Adds documentation on editing thresholds in the QuaC-Watch config file (closes #85)
24+
25+
2023-10-05 Manavalan Gajapathy
26+
27+
* Refactors to accept sample filepaths via user-provided sample config file, when `--allow_sample_renaming` is used (#86)
28+
29+
2023-10-05 Manavalan Gajapathy
30+
31+
* Refactors to accept sample filepaths via user-provided sample config file, when `--include_prior_qc` is used (#86)
32+
* Adds a test sample config file that includes priorQC filepaths
33+
34+
2023-10-05 Manavalan Gajapathy
35+
36+
* Refactors to accept sample filepaths via user-provided sample config file. Only for exome mode in minimal manner (w/o
37+
--include_prior_qc, --allow_sample_renaming) (#86)
38+
* Adds a test sample config file
39+
* Refactors to get capture bed file as input from the sample configfile
40+
41+
2023-10-05 Manavalan Gajapathy
42+
43+
* Refactors to accept sample filepaths via user-provided sample config file. Only for WGS mode in minimal manner (w/o
44+
--include_prior_qc, --allow_sample_renaming) (#86)
45+
* Adds sample config file to use with system testing datasets -
46+
`.test/configs/no_priorQC/sample_config/project_2samples.tsv`. This provides map of sample name to their VCF and BAM
47+
filepaths.
48+
* Refactors use of `--sample_config` arg to work with this config file as input
49+
* Deprecates args `--project_name` and `--projects_path`
50+
* Modifies workflow to use the new input setup
51+
* Updates README concerning the changes made
52+
1553
2023-07-17 Manavalan Gajapathy
1654

1755
* Minor updates to documentation.

docs/input_output.md

Lines changed: 40 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,40 @@
22

33
## Input
44

5+
### Sample config file
6+
7+
Sample identifier and their necessary filepaths (`bam`, `vcf`, etc.) are provided to QuaC in a `tsv` formatted config
8+
file via `--sample_config`. Columns required depend on the flags supplied to `src/run_quac.py`. This table lists the
9+
allowed columns and when to use them.
10+
11+
| Column | When to use | Description |
12+
| --------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------- |
13+
| sample_id | Always | Sample identifier |
14+
| bam | Always | BAM filepath |
15+
| vcf | Always | VCF filepath |
16+
| capture_bed | `--exome` | Capture region bed filepath |
17+
| fastqc_raw | `--include_prior_qc` | Filepath to FastQC `zip` files created from raw fastqs. Use comma as delimiter if multiple files. |
18+
| fastqc_trimmed | `--include_prior_qc` | Filepath to FastQC `zip` files created from trimmed fastqs. Use comma as delimiter if multiple files. |
19+
| fastq_screen | `--include_prior_qc` | Filepath to FastQ Screen `txt` files. Use comma as delimiter if multiple files. |
20+
| dedup | `--include_prior_qc` | Filepath to Picard's MarkDuplicates `txt` files. Use comma as delimiter if multiple files. |
21+
| multiqc_rename_config | `--allow_sample_renaming` | Filepath to label rename configfile to use with multiqc |
22+
23+
Refer to our system testing directory for example sample config files at `.test/configs`. For example:
24+
25+
* `.test/configs/no_priorQC/sample_config/project_2samples_wgs.tsv` - Sample config file for WGS samples and no prior
26+
QC.
27+
* `.test/configs/no_priorQC/sample_config/project_2samples_exome.tsv` - Sample config file for exome samples and no
28+
prior QC. Note that WGS and exome samples can't be used in the same config file.
29+
* `.test/configs/include_priorQC/sample_config/project_2samples_wgs.tsv` - Sample config file for WGS samples with prior
30+
QC data available from [certain QC tools](./index.md#optional-qc-output-consumed-by-quac).
31+
32+
### Pedigree file
33+
534
<!-- markdown-link-check-disable -->
635

7-
Samples belonging to a project are provided as input via `--pedigree` to QuaC in [pedigree file
8-
format](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format). Only the samples that are
9-
supplied in pedigree file will be processed by QuaC and all of these samples must belong to the same project.
36+
QuaC requires a [pedigree
37+
file](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format) as input via `--pedigree`.
38+
Samples listed in this file must correspond to those in sample config file (`--sample_config`).
1039

1140
<!-- markdown-link-check-enable -->
1241

@@ -16,102 +45,23 @@ supplied in pedigree file will be processed by QuaC and all of these samples mus
1645
create a dummy pedigree file, which will lack sex (unless project tracking sheet is provided), relatedness and
1746
affected status info. See header of the script for usage instructions.
1847

19-
20-
Each sample must have `BAM` and `VCF` files available in the directory structure shown below for sample `X`.
21-
22-
```
23-
test_project/
24-
└── analysis
25-
├── X
26-
│ ├── bam
27-
│ │   ├── X.bam
28-
│ │   └── X.bam.bai
29-
│ └── vcf
30-
│ ├── X.vcf.gz
31-
│ └── X.vcf.gz.tbi
32-
└── Y
33-
└── ....
34-
```
35-
36-
When run in exome mode using flag `--exome`, QuaC requires a capture-regions bed file at the path
37-
`path_to_sample/configs/small_variant_caller/<capture_regions>.bed` for each sample.
38-
39-
```
40-
test_project/
41-
└── analysis
42-
├── X
43-
│ ├── bam
44-
│ │   ├── X.bam
45-
│ │   └── X.bam.bai
46-
│ ├── configs
47-
│ │   └── small_variant_caller
48-
│ │   └── capture_regions.bed
49-
│ └── vcf
50-
│ ├── X.vcf.gz
51-
│ └── X.vcf.gz.tbi
52-
└── Y
53-
└── ....
54-
```
55-
56-
*Optionally*, QuaC can also utilize QC results produced by [certain
57-
tools](./index.md#optional-qc-output-consumed-by-quac) when run with flag `--include_prior_qc`. In this case, following
58-
directory structure is expected.
59-
60-
```
61-
test_project/
62-
└── analysis
63-
├── X
64-
│ ├── bam
65-
│ │   ├── X.bam
66-
│ │   └── X.bam.bai
67-
│ ├── qc
68-
│ │   ├── dedup
69-
│ │   │   ├── X-1.metrics.txt
70-
│ │   │   └── X-2.metrics.txt
71-
│ │   ├── fastqc-raw
72-
│ │   │   ├── ....
73-
│ │   ├── fastqc-trimmed
74-
│ │   │   ├── ....
75-
│ │   ├── fastq_screen-trimmed
76-
│ │   │   └── ....
77-
│ │   └── multiqc_initial_pass <--- needed only when `--allow_sample_renaming` flag is used
78-
│ │   └── multiqc_sample_rename_config
79-
│ │   └── X_rename_config.tsv
80-
│ └── vcf
81-
│ ├── X.vcf.gz
82-
│ └── X.vcf.gz.tbi
83-
└── Y
84-
└── ....
85-
```
86-
87-
88-
!!! note "CGDS users only"
89-
90-
Output (bam, vcf and QC output) produced by CGDS's small variant caller pipeline can be readily used as input to
91-
QuaC with flags `--include_prior_qc` and `--allow_sample_renaming`.
92-
93-
### Example project structure
94-
95-
Refer to system testing directory `.test/` in the repo for an example project to see an example project with above
96-
mentioned directory structure needed as input. In this setup, projects A and B have prior QC data included, whereas
97-
samples C and D do not have them. Refer to pedigree files under `.test/configs/` on how these example samples were used
98-
as input to QuaC.
99-
100-
10148
## Output
10249

10350
QuaC results are stored at the path specified via option `--outdir` (default:
104-
`data/quac/results/test_project/analysis`). Refer to the [system testing's
105-
output](./system_testing.md#expected-output-files) to learn more about the output directory structure.
51+
`data/quac/results/test_project/analysis`). Refer to the [system testing's
52+
output](./system_testing.md#expected-output-files) to learn more about the output directory structure.
53+
54+
QC output are stored at the sample level as well as the project level (ie. all samples considered together) depending on
55+
the type of QC run. For example, Qualimap tool is run at the sample level whereas Somalier tool is run at the project
56+
level. MultiQC reports are available both at the sample and project level.
10657

107-
!!! tip
58+
!!! tip
10859

109-
Users may primarily be interested in the aggregated QC results produced by [multiqc](https://multiqc.info/),
60+
Users may primarily be interested in the aggregated QC results produced by [MultiQC](https://multiqc.info/),
11061
both at sample-level as well as at the project-level. These multiqc reports also include summary of QuaC-Watch
11162
results at the top.
11263

11364
!!! note "CGDS users only"
11465

11566
QuaC's output directory structure was designed based on the output structure of the [CGDS small variant caller
11667
pipeline](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline).
117-

0 commit comments

Comments
 (0)