ALL_markers

Cancer Systems Biology, Section of Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800, Lyngby, Copenhagen

ALL_markers

Introduction

This repository contains scripts related to the discovery of gene expression markers that distinguish two acute lymphoblastic leukemia subtypes: B-cell and T-cell ALL.

Data-driven discovery of gene expression markers distinguishing pediatric acute lymphoblastic leukemia subtypes. Mona Nourbakhsh, Nikola Tom, Anna Schroeder Lassen, Helene Brasch Lind Petersen, Ulrik Kristoffer Stoltze, Karin Wadt, Kjeld Schmiegelow, Matteo Tiberti, Elena Papaleo. bioRxiv 2024.02.26.582026; doi: https://doi.org/10.1101/2024.02.26.582026

Please cite the paper above if you use the scripts for your own research.

Installation requirements and guideline for reproducing the analyses

All analyses have been performed on a GNU/Linux server.

Computing environment

To reproduce the data and results, you will need to set up a conda environment which contains the expected R version and the required R packages. This requires being able to run Anaconda by means of the conda executable.

If you don't have access to conda, please see the Miniconda installer page for instructions on how to install Miniconda.

Once you have access to conda, follow the below guidelines to reproduce the results and data:

Clone the GitHub repository into a local directory on your local machine:

git clone https://github.com/ELELAB/ALL_markers.git
cd ALL_markers

Create a virtual environment using conda and activate it afterwards. The environment should be placed in the ALL_markers folder:

conda create --prefix ./env_ALL -c conda-forge r-base=4.2 r-pacman=0.5.1 r-curl=4.3.3 r-ragg=1.2.5 r-renv=0.16.0 r-osfr=0.2.9 r-cairo=1.6.0 gsl=2.7
conda activate ./env_ALL

Run the analyses:

bash ./run_all.sh

WARNING: our scripts use the renv R package to handle automatic dependency installation. Renv writes packages in its own cache folder, which is by default in the user's home directory. This might not be desirable if free space in the home directory is limited. You can change the location of the Renv root folder by setting a system environment variable - please see comments in the run_all.sh script.

The run_all.sh will perform the following steps to reproduce all results and data:

Download data from the corresponding OSF repository which contains the required data to run the analyses and all results associated with the analyses.
Install in the environment all necessary packages to run the analyses.
Perform all analyses.

Structure and content of GitHub and OSF repositories

The GitHub and OSF repositories contain scripts and data/results associated with this publication, respectively. Both repositories are structured in the same way with a main folder named after the main analyses which then contains all scripts and data/results associated with the main analysis. Below is an overview of these main folders. See README files in each main folder for more details.

TARGET_data:

This directory contains gene expression data of the TARGET-ALL-P2 project and information about its metadata such as age of patients, number of samples, number of genes, subtype information etc. Moreover, here the data is subsetted to contain only Primary Blood Derived Cancer - Bone Marrow samples

TARGET_replicates:

This directory investigates replicate samples found in the TARGET-ALL-P2 data and adjusts the data for these replicates

TARGET_processing:

This directory processes gene expression data of the TARGET-ALL-P2 project (preprocessing, normalization, and filtering using TCGAbiolinks)

TARGET_transform:

This directory voom transforms TARGET-ALL-P2 gene expression data
In here, the directory voom_transform_bmp_raw voom transforms the raw gene expression data and the directory voom_transform_bmp_comp voom transforms processed data

TARGET_pca:

This directory performs dimensionality reduction of TARGET-ALL-P2 gene expression data
In here, the directory bone_marrow_primary_pca_raw performs MDS of the raw gene expression data and the directory bone_marrow_primary_pca_comp performs MDS of the processed data (both before and after batch correction). Moreover, in bone_marrow_primary_pca_comp, PCA is performed to find the contributions of genes to principal components 1 and 2

TARGET_batch:

This directory performs batch correction of TARGET-ALL-P2 gene expression data

TARGET_dea:

This directory performs differential expression analysis of TARGET-ALL-P2 gene expression data

TARGET_housekeeping

This directory investigates overlaps of discovered differentially expressed genes with a list of housekeeping genes
This list of housekeeping genes was downloaded from https://www.tau.ac.il/~elieis/HKG/ with associated publication: Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013 Oct;29(10):569-74. doi: 10.1016/j.tig.2013.05.010. Epub 2013 Jun 27. Erratum in: Trends Genet. 2014 Mar;30(3):119-20. PMID: 23810203.

TARGET_enrichment

This directory performs enrichment analyses of consensus differentially expressed genes

TARGET_lasso

This directory performs regularized elastic net logistic regression of TARGET-ALL-P2 gene expression data

TARGET_compare_genes

This directory compares results found from above methods (differential expression analysis, elastic net logistic regression, PCA, housekeeping analysis) and with genes reported in the Network of Cancer Genes (NCG) database

TARGET_clustering

This directory performs unsupervised clustering of the TARGET-ALL-P2 gene expression data

TARGET_random_forest

This directory performs random forest variable selection on predicted clusters from unsupervised clustering

TARGET_survival

This directory performs survival analysis on predicted subtype-related and cluster-related gene expression markers

TARGET_drug_targets

This directory investigates if any of the subtype-related and cluster- related gene expression markers are annotated as drug targets in the Drug Gene Interaction Database

TARGET_known_markers

This directory compares the expression of predicted markers with known markers

TARGET_blood_validation

This directory performs clustering of expression of predicted markers of TARGET blood samples

GTEx_validation

This directory performs clustering of expression of predicted markers of GTEx blood and bone marrow samples

validation

This directory contains scripts used to perform validation of predicted gene expression markers from an independent cohort of patients with ALL from a Danish hospital. As this data cannot be granted, these scripts are not meant to be runnable.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
GTEx_validation		GTEx_validation
TARGET_batch/bone_marrow_primary_batch_comp		TARGET_batch/bone_marrow_primary_batch_comp
TARGET_blood_validation		TARGET_blood_validation
TARGET_clustering/bone_marrow_primary_clustering_comp		TARGET_clustering/bone_marrow_primary_clustering_comp
TARGET_compare_genes/bone_marrow_primary_compare_comp		TARGET_compare_genes/bone_marrow_primary_compare_comp
TARGET_data		TARGET_data
TARGET_dea/bone_marrow_primary_dea_comp		TARGET_dea/bone_marrow_primary_dea_comp
TARGET_drug_targets/bone_marrow_primary_drug_targets_comp		TARGET_drug_targets/bone_marrow_primary_drug_targets_comp
TARGET_enrichment/bone_marrow_primary_enrichment_comp		TARGET_enrichment/bone_marrow_primary_enrichment_comp
TARGET_housekeeping/bone_marrow_primary_housekeeping_comp		TARGET_housekeeping/bone_marrow_primary_housekeeping_comp
TARGET_known_markers/bone_marrow_primary_known_markers_comp		TARGET_known_markers/bone_marrow_primary_known_markers_comp
TARGET_lasso/bone_marrow_primary_lasso_comp		TARGET_lasso/bone_marrow_primary_lasso_comp
TARGET_pca		TARGET_pca
TARGET_processing/bone_marrow_primary_comp		TARGET_processing/bone_marrow_primary_comp
TARGET_random_forest/bone_marrow_primary_random_forest_comp		TARGET_random_forest/bone_marrow_primary_random_forest_comp
TARGET_replicates		TARGET_replicates
TARGET_survival/bone_marrow_primary_survival_comp		TARGET_survival/bone_marrow_primary_survival_comp
TARGET_transform		TARGET_transform
iCOPE_expression_correlation		iCOPE_expression_correlation
validation		validation
LICENSE		LICENSE
README.md		README.md
init.R		init.R
renv.lock		renv.lock
run_all.sh		run_all.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALL_markers

Introduction

Installation requirements and guideline for reproducing the analyses

Computing environment

Structure and content of GitHub and OSF repositories

About

Releases

Packages

Contributors 3

Languages

License

ELELAB/ALL_markers

Folders and files

Latest commit

History

Repository files navigation

ALL_markers

Introduction

Installation requirements and guideline for reproducing the analyses

Computing environment

Structure and content of GitHub and OSF repositories

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages