Code for: Rapid UPF1 depletion illuminates the temporal dynamics of the NMD-regulated human transcriptome

This repository contains the scripts for the project:
Rapid UPF1 depletion illuminates the temporal dynamics of the NMD-regulated human transcriptome
(available as bioRxiv preprint)

Graphical abstract

Scope

This repository primarily aims to provide transparent insight into the high-throughput analysis steps used in the study of UPF1 depletion via conditional degron tags (CDT) in human cells. ⚠️ NOTE: The complete pipeline and individual scripts are currently not optimized to run on different computing infrastructures in a standardized/portable manner. This means that all required packages have to be installed manually and configured accordingly to reproduce the results.
Additionally, all R scripts used for analyses and to create the plots are available. UPF1_NMDRHT_Analysis.R is the top level file.

Combined with the "Resources", which can be obtained from Zenodo, the R scripts should allow reproducing all intermediate analyses and plots.

NOTE: The Resources were obmitted from this repository due to their large filesize.
The Resources contain pre-analyzed and intermediate output data.
The combination of scripts and resources requires the following local folder structure to work properly:

2025_UPF1_NMDRHT/
├── Resources/                     // Contents from the Resources.zip from Zenodo
│   ├── bakR/                      // First resource folder
│   └── ...                        // Other resource folders
├── Plots/                         // Output for analyses plots
│   ├── Figure1/                   // Plots for Figure1
│   └── ...                        // Other Plots folders
├── .../                           // Other folders from GitHub, e.g. "Analyses" or "Tables"
├── UPF1_NMDRHT_Analysis.R         // Top-level R analysis script
└── ...                            // Other R analysis scripts

For immediate exploration, the finalized and styled Excel MainTables for Gene- and Transcript-level analyses can be used, which are found here.

Features / Requirements for raw data analysis

Complete analysis of multiple RNA-Seq datasets (provided in FASTQ format; see here for dataset overview and here for individual sample identification), mapped to Gencode v42 / GRCh38.primary_assembly supplemented with SIRVomeERCCome (from Lexogen; download) using STAR (short-read) or minimap2 (long-read).
Transcript quantification was performed using Salmon in mapping-based mode with a decoy-aware transcriptome index (either GENCODE.v42 or consolidated NMDRHT.v1.2) and the options --numGibbsSamples 30 --useVBOpt --gcBias --seqBias
Differential gene expression (DGE) was analyzed via DESeq2 or swish, differential transcript expression (DTE) via edgeR, differential transcript usage (DTU) via IsoformSwitchAnalyzeR and alternative splicing (AS) via LeafCutter.
The main pre-revision CRSA_V009.sh or post-revision Bash script CRSA_V010.sh runs the complete pipeline for standard short-read RNA-Seq data or individual modules using the options (see CRSA_V010.sh -h) and requires a design file specifying the following:
- reference type (gencode.v42.SIRVomeERCCome was used in this study)
- sequencing design (single- or paired-end reads)
- study name
- folder locations (srvdir for raw file locations, mydir for analyses output)
- location of the experiment file which specifies sample IDs and condition
Please see the provided design.txt file example for more information concerning this design file. An example for the tab-delimited experiment.txt file is provided as well. Please see the comments in CRSA_V009.sh or CRSA_V010.sh for further instructions
To run/reproduce the complete analysis script, many modules require specific tools. Please make sure you have the following tools installed and configured if required:
- STAR - version 2.7.10b was used for the analyses - with genome indices generated using GRCh38.primary.SIRVomeERCCome.fa and gencode.v42.SIRVomeERCCome.annotation.gtf (both reference files can be found here). The following code was used for genome index generation:
```
STAR   --runMode genomeGenerate   --runThreadN 15   --genomeDir /home/volker/reference/gencode.v42.SIRVomeERCCome  --genomeFastaFiles /home/volker/reference/Gencode/GRCh38.primary.SIRVomeERCCome.fa --sjdbGTFfile /home/volker/reference/Gencode/gencode.v42.SIRVomeERCCome.annotation.gtf   --sjdbOverhang 99
```
- Alfred - version v0.2.6 was used for the analyses
- samtools - version 1.16.1 (using htslib 1.16) was used for the analyses
- IGV tools - version 2.14.1 or 2.17.2 was used for the analyses - make sure you have the gencode.v42.SIRVomeERCCome.chrom.sizes file (can be found here) located in /PATH/TO/IGV/lib/genomes
- Salmon - version v1.9.0 was used for the analyses - with an index generated using gentrome.v42.SIRV.ERCC.fa.gz and decoys.txt (can be found here). A separate conda environment was created for Salmon. The following code was used for index generation:
```
salmon index -t /home/volker/reference/Gencode/gentrome.v42.SIRV.ERCC.fa.gz -d /home/volker/reference/Gencode/decoys.txt -p 12 -i /home/volker/reference/Transcriptome/gencode.v42.SIRVomeERCCome --gencode
```
- DESeq2 - version 1.40.1 was used for the analyses. The tx2gene file used for the analyses can be found here
- IsoformSwitchAnalyzeR - version 1.18.0 was used for the analyses.
- LeafCutter - version v0.2.9 was used for the analyse. 📝 NOTE: small changes in the /scripts of LeafCutter maintained gene IDs from Gencode (changed in gtf_to_exons_vb.R and leafcutter_ds.R)
- FastQC - version 0.11.9 was used for the analyses
- MultiQC - version v1.14 was used for the analyses
Additionally, many analyses were run using a plethora of R packages (including swish, edgeR, ...), please see the session info for the individual R scripts for more information.
All analyses were performed on a 16-core (2x Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz) workstation with 128 GB RAM running Ubuntu 22.04.2 LTS
Please make sure to change installation and file paths in the respective scripts to match your local environment

Individual scripts

The specialized scripts called by the main CRSA_V009.sh (pre-revision) or CRSA_V010.sh (post-revision) script can be found here.

Config files and scripts for analyzing Ribo-Seq and long-read RNA-Seq data are provided in the folder as well.

Feedback / Questions

Feedback is welcome! For any question, please email: [email protected]{.email} or create an issue

Citation

Journal article

TBD

bioRxiv preprint

Volker Boehm, Damaris Wallmeroth, Paul O. Wulf, Luiz Gustavo Teixeira Alves, Oliver Popp, Maximilian Riedel, Emanuel Wyler, Marek Franitza, Jennifer V. Gerbracht, Kerstin Becker, Karina Polkovnychenko, Simone Del Giudice, Nouhad Benlasfer, Philipp Mertins, Markus Landthaler, Niels H. Gehring (2024) Rapid UPF1 depletion illuminates the temporal dynamics of the NMD-regulated transcriptome in human cells. bioRxiv 2024.03.04.583328; doi: https://doi.org/10.1101/2024.03.04.583328

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Analyses		Analyses
Plots		Plots
Tables		Tables
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
UPF1_NMDRHT_Analysis.R		UPF1_NMDRHT_Analysis.R
UPF1_NMDRHT_Annotation.R		UPF1_NMDRHT_Annotation.R
UPF1_NMDRHT_Data_sources.R		UPF1_NMDRHT_Data_sources.R
UPF1_NMDRHT_EZbakR_NMDRHT.R		UPF1_NMDRHT_EZbakR_NMDRHT.R
UPF1_NMDRHT_Figure1.R		UPF1_NMDRHT_Figure1.R
UPF1_NMDRHT_Figure2.R		UPF1_NMDRHT_Figure2.R
UPF1_NMDRHT_Figure3.R		UPF1_NMDRHT_Figure3.R
UPF1_NMDRHT_Figure4.R		UPF1_NMDRHT_Figure4.R
UPF1_NMDRHT_Figure5.R		UPF1_NMDRHT_Figure5.R
UPF1_NMDRHT_Figure6.R		UPF1_NMDRHT_Figure6.R
UPF1_NMDRHT_Figure7.R		UPF1_NMDRHT_Figure7.R
UPF1_NMDRHT_Functions.R		UPF1_NMDRHT_Functions.R
UPF1_NMDRHT_Gene_Level.R		UPF1_NMDRHT_Gene_Level.R
UPF1_NMDRHT_ORFs.R		UPF1_NMDRHT_ORFs.R
UPF1_NMDRHT_PCR_analysis.R		UPF1_NMDRHT_PCR_analysis.R
UPF1_NMDRHT_Protein_Level.R		UPF1_NMDRHT_Protein_Level.R
UPF1_NMDRHT_RevisionAnalyses.R		UPF1_NMDRHT_RevisionAnalyses.R
UPF1_NMDRHT_SupplementalTables.R		UPF1_NMDRHT_SupplementalTables.R
UPF1_NMDRHT_Transcript_Level.R		UPF1_NMDRHT_Transcript_Level.R
UPF1_NMDRHT_Translation.R		UPF1_NMDRHT_Translation.R
UPF1_NMDRHT_bakR.R		UPF1_NMDRHT_bakR.R
UPF1_NMDRHT_session_info.20250322.1410.txt		UPF1_NMDRHT_session_info.20250322.1410.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code for: Rapid UPF1 depletion illuminates the temporal dynamics of the NMD-regulated human transcriptome

Graphical abstract

Scope

Features / Requirements for raw data analysis

Individual scripts

Feedback / Questions

Citation

Journal article

bioRxiv preprint

About

Uh oh!

Releases 1

Uh oh!

Languages

License

boehmv/2025_UPF1_NMDRHT

Folders and files

Latest commit

History

Repository files navigation

Code for: Rapid UPF1 depletion illuminates the temporal dynamics of the NMD-regulated human transcriptome

Graphical abstract

Scope

Features / Requirements for raw data analysis

Individual scripts

Feedback / Questions

Citation

Journal article

bioRxiv preprint

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages