Skip to content

Commit

Permalink
add revision exp documentation to README
Browse files Browse the repository at this point in the history
  • Loading branch information
divyaramamoorthy committed Jun 20, 2022
1 parent 26a7dc3 commit c075d66
Showing 1 changed file with 25 additions and 15 deletions.
40 changes: 25 additions & 15 deletions analysis/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# Experiment Workflow
This folder includes all scripts used to run experiments and generate manuscript figures.

MoGP can be computationally intensive to run. In the experiments described here, Azure virtual machines were used train the models. Machine sizes and run times listed [here](reports/mogp_azure_runs.xlsx).
MoGP can be computationally intensive to run. In the experiments described here, Azure virtual machines were used train the models. Machine sizes and run times listed [here](reports/mogp_azure_runs.xlsx). Revision experiments were run using the C3DDB cluster at MIT; SLURM sbatch files for these runs are provided in this repo.

The folder is intended for reference; code cannot be run unless clinical data is gathered by the user.

**Download Data**: This analysis uses clinical scores from four clinical ALS cohorts, three of which are available to download publicly or upon request. See 2a below for versions used in manuscript.
- AnswerALS (AALS): AALS is publicly available (download "Full Metadata" at data.answerals.org)
- Clinical Trial of Ceftriaxone in ALS (CEFT): CEFT can be downloaded from National Institute of Neurological Disorders and Stroke (NINDS) (https://www.ninds.nih.gov/Current-Research/Research-Funded-NINDS/Clinical-Research/Archived-Clinical-Research-Datasets) by request
- The Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT): PRO-ACT can be downloaded by request (https://nctu.partners.org/ProACT)
- Emory ALS Clinic database (EMORY): Restricted access at this time
**Download Data**: This analysis uses clinical scores from five clinical ALS cohorts and two non-ALS cohorts, five of which are available to download publicly or upon request. See 2a below for versions used in manuscript.
- AnswerALS (AALS): can be downloaded from data.answerals.org
- Clinical Trial of Ceftriaxone in ALS (CEFT): can be downloaded from National Institute of Neurological Disorders and Stroke (NINDS) (https://www.ninds.nih.gov/Current-Research/Research-Funded-NINDS/Clinical-Research/Archived-Clinical-Research-Datasets) by request
- The Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT): can be downloaded by request (https://nctu.partners.org/ProACT)
- Emory ALS Clinic database (EMORY): restricted access at this time
- ALS Natural History (NATHIST): available from the ALS/MND Natural History Consortium (https://www.data4cures.org/requestingdata)
- Parkinson's Progression Markers Initiative (PPMI): can be downloaded from https://www.ppmi-info.org/access-data-specimens/download-data
- Alzheimer's Disease Neuroimaging Initiative (ADNI): can be downloaded through the LONI Image and Data Archive (https://adni.loni.usc.edu/data-samples/access-data/#access_data).

## 1) Pre-processing
**Description**: Creates .csv matrices from .sas7bdat files - for friendlier use with Python scripts
Expand All @@ -21,10 +24,11 @@ MoGP can be computationally intensive to run. In the experiments described here,
### a. Process raw clinical data to matrix
**Description**: Harmonize clinical data to consistent format
**Inputs**: Paths to each of the raw datafiles or folders
- AnswerALS: `data/raw_data/aals` (version: Dec 22, 2020)
- Ceftriaxone: `data/processed_data/ceft`
- Emory: `data/raw_data/emory/emory_deidentified_04012020.xlsx` (version: Apr 1, 2020)
- PROACT: `data/raw_data/proact` (version: Jan 4, 2016)
- AALS: `data/raw_data/aals` (version: Dec 22, 2020)
- CEFT: `data/processed_data/ceft`
- EMORY: `data/raw_data/emory/emory_deidentified_04012020.xlsx` (version: Apr 1, 2020)
- PRO-ACT: `data/raw_data/proact` (version: Jan 4, 2016)
- NATHIST: `data/raw_data/nathist`

**Outputs**: Processed static and timeseries datafiles: `static_proact_death.csv`, `timeseries_all_alsfrsr.csv`, `timeseries_proact_fvcp.csv`
**Script**: [clean_clinical_data.py](clean_clinical_data.py)
Expand All @@ -41,13 +45,9 @@ MoGP can be computationally intensive to run. In the experiments described here,
**Outputs**: trained model files, in `data/model_data`
**Script**: [run_mogp_experiments.py](run_mogp_experiments.py)

## 4) Figures - see Jupyter notebooks for more information

## 4) Main Figures - see Jupyter notebooks for more information
**Full MoGP Trajectories:** Fig 1 - [plot_mogp_full_panel_figure.ipynb](plot_mogp_full_panel_figure.ipynb)

**Study Summary Statistics:** Table 1 -
[summ_stats_mogp_table.ipynb](summ_stats_mogp_table.ipynb)

**Trajectory Linearity:** Fig 2 -
[plot_mogp_linearity.ipynb](plot_mogp_linearity.ipynb)

Expand All @@ -61,3 +61,13 @@ MoGP can be computationally intensive to run. In the experiments described here,

**Alternate outcomes:** Fig 6 -
[plot_alternate_outcomes.ipynb](plot_alternate_outcomes.ipynb)

## 5) Additional Tables/Figures - see Jupyter notebooks for more information
**Study Summary Statistics:**
[summ_stats_mogp_table.ipynb](summ_stats_mogp_table.ipynb)

**Alzheimer's and Parkinson's Trajectories:**
[nonals_domains.ipynb](nonals_domains.ipynb)

**Dominant ALS patterns (clustered clusters):**
[cluster_sets.ipynb](cluster_sets.ipynb)

0 comments on commit c075d66

Please sign in to comment.