Skip to content

innatelab/vzv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Collection of general scripts for "Varizella-Zoster Virus proteomic profiling" (Girault et al., 2025) study using in-hous packages

These in-house R and Julia packages are used by the VZV data analysis scripts:

Affinity-purification of V5-tagged VZV proteins in SK-N-BE2 cells (interactomes)

The analysis of affinity-purification data of V5-tagged VZV proteins in SK-N-BE2 cells requires msglm (v0.5.0) and msimportr (v0.3.0) packages and is performed in the following steps:

  1. prepare_data_apms.R script
    • Reads the MaxQuant files ProteinGroups.txt and evidence.txt and prepares the data for statistical analysis.
    • Outputs:
      • msfull.RData containing the formated and annotated dataset, with protein groups and peptides intensities.
      • msglm.RData containing the formated and annotated dataset with protein groups intensities, per-MS run intensity normalisation factors, MS instrument noise model parameters, the GLM model description (matrices of effects, batch effects, and contrasts) including parameter priors.
  2. msglm_fit_chunk_apms.R script is called for each protein group (job chunk) in the data set
    • Gets protgroup_id as an input
    • Extracts the data of a given a protein group from msglm.RData
    • Fits the Bayesian model defined in msglm.RData using msglm package
    • Outputs <job_chunk>.RData file with the msglm results
  3. msglm_fit-chunks_apms.sge.sh is a shell script for the parallel execution of msglm_fit_chunk_apms.R on the compute cluster using SGE job scheduler.
  4. assemble_fits_apms.R script assembles all individual <job_chunk>.RData files into a single report.
    • Extracts the significance of relevant model contrasts for each protein group.
    • The results are compiled into the Supplementary Table S3, Tab 1 - VZV-Host interactions and Tab 2 - VZV ORF baits.

Full proteome changes induced by the depletion of the MPP8 gene in SK-N-BE2 cells

The analysis of proteomic changes induced by the depletion of the MPP8 gene in SK-N-BE2 cells requires msglm package (v.0.6.0) and msimportr (v0.3.0).

  1. prepare_data_MPP8_KO.R
    • Reads the MaxQuant files: peptides.txt and evidence.txt
    • Corrects protein groups using the protregroup.jl script to avoid splitting isoforms of the same gene into individual protein groups that differ only by a single specific peptide (see Material and Methods section for details).
    • Outputs:
      • msfull.RData containing the formated and annotated dataset with protein groups and peptides intensities.
      • msglm.RData containing the formated and annotated dataset with peptide intensities, per-MS run intensity normalisation factors, MS instrument noise model parameters, the GLM model description (matrices of effects) including parameter priors.
  2. msglm_fit_chunk_MPP8_KO.R script is called for each protein group (job chunk) in the data set
    • Gets protregroup_id as an input
    • Extracts the data of a given a protein group from msglm.RData
    • Fits the Bayesian model defined in msglm.RData using msglm package
    • Outputs <job_chunk>.RData file with the msglm results
  3. msglm_fit-chunks_MPP8_KO.lrz.sh is a shell script for the parallel execution of msglm_fit_chunk_MPP8_KO.R on the compute cluster using SLURM workload manager.
  4. assemble_fits_MMP8_KO.R script assembles all individual <job_chunk>.RData files into a single report.
    • Extracts the significance of relevant model contrasts for each protein group.
    • These results are integrated in the compiled "Complementary omics datasets" Table S5

Integration of the interactome and effectome data by network diffusion analysis

The integration of VZV virus-host interactome (virus-host protein interaction) and effectome (proteomic changes induced by the expression of individual viral proteins) is done by diffusing the protein abundance perturbations over the global network of protein-protein and functional gene interactions within the host cell (ReactomeFI database is used). It is implemented in Julia and uses in-house HierarchicalHotNet.jl package that implements Hierarchical HotNet method and provides additional statistics for the network diffusion process.

  1. hotnet_analysis.jl is the general script for the HotNet analysis:
    • Reads the interactome and effectome (the results of the msglm analysis).
    • Reads the ReactomeFI network of protein-protein and functional gene interactions.
    • Prepares the effectome-based nodes and edges weights for the HotNet network diffusion analysis of each viral bait (step 2).
    • Prepares the 1000 random permutations of node and edge weights per viral protein for step 3 and saves them in hotnet_perm_input.jlser.zst file.
    • Reads the network diffusion results from step 2 and the combined permutation statistics from step 4 (hotnet_perm_assembled_<viral_protein>.jlser.zst).
    • Identifies the significant interactions between the host proteins physically associated with viral proteins (interactome) and the proteins effected by viral proteins overexpression (effectome)
    • Outputs the Supplementary Table S4 - HotNet analysis results (FIXME) with the significant interactions between the host proteins and viral proteins.
  2. hotnet_treestats_chunks.jl
    • Performs HotNet network diffusion to integrate interactome and effectome for each viral protein using unperturbed weights.
    • Calculates the Strongly Connected Component Tree statistics for each edge weight cutoff threshold.
    • Outputs the hotnet_treestats_<viral_protein>.jlser.zst file with the HotNet network diffusion results for each viral protein.
    • hotnet_treestats_chunk.lrz.sh is the associated SLURM job script for parallel execution on the compute cluster.
  3. hotnet_perm_chunk.jl
    • Reads a block of random effectome weights permutations (job chunk) from hotnet_perm_input.jlser.zst generated at step 1.
    • Performs memory and computationally intensive network diffusion for each weights permutation of the given job chunk.
    • Calculates the Strongly Connected Component Tree statistics for each edge weight cutoff threshold.
    • Outputs the hotnet_perm_<job_chunk>.jlser.zst file.
    • hotnet_perm_chunk.lrz.sh is the associated SLURM shell script for the parallel execution on the compute cluster.
  4. hotnet_perm_chunk_assemble.jl
    • assembles the HotNet permuted tree results (hotnet_perm_<job_chunk>.jlser.zst) generated at step 3
    • Outputs the combined permutation statistics (hotnet_perm_assembled_<viral_protein>.jlser.zst).

About

Scripts (Julia & R) for the analysis of "Varicella-Zoster Virus-host" project data

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •