This folder contains my Snakemake pipeline for running FRASER and FRASER2.
Please install the conda environments with:
micromamba create -f /path/to/fraser1.yml
micromamba create -f /path/to/fraser2.yml
The yml files can be found at:
FRASER_snakemake/conda_envs
-Make a csv with a list of all of the full filepaths of the files you want to be included in your FRASER run.
-For an example, see symlink_blood.csv in the FRASER_snakemake/config directory
-Make a config.yaml file in the config folder with the following information
output_directory: "/file/path/to/output/directory"
input_directory: "/file/path/where/you/want/to/symlink/data/to"
file_list: "/csv/file/where/you/have/list/of/files/to/run.csv"
FRASER_type: "Both" OR "FRASER" OR "FRASER2" (see note below)
FRASER_type input options:
Please input ONE string from the three options below.
-"FRASER": to indicate you want to run FRASER with the outputs of theta, psi3, and psi5
-"FRASER2": to indicate you want to run FRASER with the Jaccard index output
-"Both": to indicate you want to receive all four outputs
The file_list is a .csv that contains one column (without a header) where every row is a complete filepath to the .bam or .bai files for your analysis cohort. Every individual in your cohort must have two lines in the file_list- one for the .bam file and one for the .bai file.
The first 12 lines of a file_list (csv) may look like this:
/desktop/STAROutput/genome/A1_star_hg38_Aligned.sortedByCoord.out.bam
/desktop/STAROutput/genome/A1_star_hg38_Aligned.sortedByCoord.out.bai
/desktop/STAROutput/genome/B1_star_hg38_Aligned.sortedByCoord.out.bam
/desktop/STAROutput/genome/B1_star_hg38_Aligned.sortedByCoord.out.bai
/desktop/STAROutput/genome/C1_star_hg38_Aligned.sortedByCoord.out.bam
/desktop/STAROutput/genome/C1_star_hg38_Aligned.sortedByCoord.out.bai
/desktop/STAROutput/genome/D1_star_hg38_Aligned.sortedByCoord.out.bam
/desktop/STAROutput/genome/D1_star_hg38_Aligned.sortedByCoord.out.bai
/desktop/STAROutput/genome/E1_star_hg38_Aligned.sortedByCoord.out.bam
/desktop/STAROutput/genome/E1_star_hg38_Aligned.sortedByCoord.out.bai
/desktop/STAROutput/genome/F1_star_hg38_Aligned.sortedByCoord.out.bam
/desktop/STAROutput/genome/F1_star_hg38_Aligned.sortedByCoord.out.bai
-Go to the FRASER_snakemake/workflow directory
-run the following command:
./run_snakemake --config_file "/path/to/config/yaml" --profile "path/to/config/slurm_scg"
The files in the slurm_scg directory will allow the snakemake to be run with resources on Stanford's scg/ oak
This folder contains the code for the main and supplemental figures of my manuscript (except Figure 1, which was made using Excel).
Please install the conda environments with:
micromamba create -f /path/to/fraser1.yml
micromamba create -f /path/to/fraser2.yml
The yml files can be found at:
run_results_paper/conda_envs
-Make a config.yaml file in the config folder with the following information
FRASER1_results_uncompiled: "/file/path/to/raw/FRASER/output.csv"
input_file_FRASER: "/file/path/to/csv/with/filepaths/and/ids/of/all/samples/run/in/FRASER.csv"
FRASER2_results_uncompiled: "/file/path/to/raw/FRASER2/output.csv"
input_file_FRASER2: "/file/path/to/csv/with/filepaths/and/ids/of/all/samples/run/in/FRASER2.csv"
metadata_file: "/file/path/to/metadata/file.csv"
mig_file: "/file/path/to/Homo_sapiens_gene.csv"
low_RIN: "/file/path/to/csv/with/samples/excluded/due/to/low/RIN.csv"
output_directory: "/file/path/to/desired/output/directory"
genesets: "/file/path/to/directory/with/genesets"
genes: "/file/path/to/FRASER/output/rds/file.rds"
genes_FRASER2: "/file/path/to/FRASER2/output/rds/file.rds"
missing: "/file/path/to/csv/with/samples/excluded/due/to/missing/info.csv"
size_run_dir: "/file/path/to/directory/with/different/iterations/of/FRASER/across/different/run/sizes"
-Go to the run_results_paper/workflow directory
-run the following command:
./run_snakemake --config_file "/path/to/config/yaml" --profile "path/to/config/slurm_scg"
The files in the slurm_scg directory will allow the snakemake to be run with resources on Stanford's scg/ oak
This folder contains the code for the main and supplemental figures of the review of my manuscript (except Figure S1, which was made using Excel).
Please install the conda environments with:
micromamba create -f /path/to/fraser1.yml
micromamba create -f /path/to/fraser2.yml
The yml files can be found at:
run_results_review/conda_envs
-Make a config.yaml file in the config folder with the following information
FRASER1_results_uncompiled: "/file/path/to/raw/FRASER/output.csv"
input_file_FRASER: "/file/path/to/csv/with/filepaths/and/ids/of/all/samples/run/in/FRASER.csv"
FRASER2_results_uncompiled: "/file/path/to/raw/FRASER2/output.csv"
input_file_FRASER2: "/file/path/to/csv/with/filepaths/and/ids/of/all/samples/run/in/FRASER2.csv"
metadata_file: "/file/path/to/metadata/file.csv"
mig_file: "/file/path/to/Homo_sapiens_gene.csv"
output_directory: "/file/path/to/desired/output/directory"
genesets: "/file/path/to/directory/with/genesets"
genes: "/file/path/to/FRASER/output/rds/file.rds"
genes_FRASER2: "/file/path/to/FRASER2/output/rds/file.rds"
size_run_dir: "/file/path/to/directory/with/different/iterations/of/FRASER/across/different/run/sizes"
-Go to the run_results_review/workflow directory
-run the following command:
./run_snakemake --config_file "/path/to/config/yaml" --profile "path/to/config/slurm_scg"
The files in the slurm_scg directory will allow the snakemake to be run with resources on Stanford's scg/ oak
This folder contains miscellaneous scripts, such as the code for Figure 1, the creation of the metadata file, and a comparison of novel samples vs samples in Ungar et al., 2024 (PMID: PMC10802764).
Contains the gene sets used in the manuscript. They were originally from Cormier, et al., 2022 (PMID: 36376793) and can also be found at https://github.com/macarthur-lab/gene_lists
Sources of the gene sets:
| Gene set | Source | Filename |
|---|---|---|
| Haploinsufficient | ClinGen dataset > Cormier et al., 2021 | haploinsufficient.tsv |
| Autosomal recessive | Blekhman et al., 2008; Berg et al., 2013 | autosomal_recessive.tsv |
| Autosomal dominant | Blekhman et al., 2008; Berg et al., 2013 | autosomal_dominant.tsv |
| Olfactory receptor | Mainland, et al., 2015 | olfactory_receptors.tsv |
| CRISPR non-essential | Hart et al., 2017 | CRISPR_nonessential_genes.tsv |
| Developmental delay | https://www.ebi.ac.uk/gene2phenotype/downloads/DDG2P.csv.gz > Firth et al., 2011; Fitzgerald, et al., 2015; Wright et al., 2015; McRae, et al., 2017; Wright, et al., 2018 | developmental_delay_genes.csv |
| OMIM | https://omim.org/downloads > Amberger et al., 2019 | OMIM_genes.tsv |