Skip to content

2025 Thesis: Does phase variation play a role in the genetic diversity of mycobacteria that causes tuberculosis in animals?

Notifications You must be signed in to change notification settings

Seadraz/MScProject_2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MScProject_2025

2025 Thesis:
Does phase variation play a role in the genetic diversity of mycobacteria that causes tuberculosis in animals?

This repository provides scripts and analyses performed during the study of homopolymeric tract (HT) mutations and their statistical significance in Mycobacterium bovis.


R Scripts

These were run directly from the RStudio terminal.
Required packages: readxl, writexl, dplyr

Background Mutation Rate Calculation

  • Script: genome_statistics_script.r
  • Description: Calculates the background mutation rate from genome-wide variant data. It parses mutation counts in the format x/n (successes/trials), sums across selected columns representing different isolates or lineages, and computes the overall background mutation rate.
  • Input: underhill_data.xlsx
  • Output: background_rate_summary.csv

HT Mutation Rate and Binomial Testing

  • Scripts:
    • public_ht_only_statistics.r
    • underhill_ht_only_statistics.r
  • Description:
    • Calculates mutation rates specifically for HT regions.
    • Applies an exact binomial test comparing observed HT mutation rates against the genome-wide background rate.
    • The underhill script includes both lineage-specific analysis (La1, La2, La3) and global analysis.
  • Inputs:
    • mummer_public_ht.xlsx
    • underhill_ht_only.xlsx
  • Outputs:
    • public_ht_binomial_results.xlsx
    • ht_binomial_results.xlsx
    • ht_rate_summary_20250914_132034.xlsx

Python Scripts

These were run in a Linux environment (Ubuntu on Windows).
Required modules: numpy, pandas, pathlib, argparse, matplotlib, openpyxl

HT Identification (Bash + Python)

  • Script: ht_identifier.sh
  • Description: Parses the M. bovis genome for HT regions (polymers of length ≥ 7).
  • Input: bovis_ref_genome.fasta
  • Output: homopolymers.txt

Mutation Detection with NUCmer (MUMmer4)

  • Requirement: MUMmer4
  • Script: run_dnadiff_all.sh
  • Description: Bash shell with embedded Python to detect mutations after WGS alignment between reference and query genomes.
  • Input:
    • bovis_ref_genome.fasta
    • Query genome(s)
  • Output: Excel sheets with mutation results for each query (see Appendix 1 for results).

HT Lineage Analysis

  • Script: ht_lineage_analysis.py
  • Description:
    • Compares mutation rates across lineages (La1, La2, La3) for the 27 significant HT regions in the Underhill dataset.
    • Significance was determined via background mutation rate calculation (R scripts).
    • Creates a heatmap and outputs tidy CSV files.
  • Input: underhill_ht_only.xlsx
  • Outputs:
    • underhill_ht_lineage_rates_sig27.csv
    • underhill_ht_lineage_rates_sig27_pretty.csv
    • underhill_ht_lineage_heatmap_sig27.png

Dotplots

Underhill Data

  • Script: dotplot_underhill_script.py
  • Input: underhill_ht_only.xlsx
  • Outputs:
    • ht_plot_by_position_tidy.csv
    • ht_plot_by_position_combined.png
    • ht_plot_by_position_combined.pdf
    • ht_plot_by_position_L1.png / .pdf
    • ht_plot_by_position_L2.png / .pdf
    • ht_plot_by_position_L3.png / .pdf

Public Data

  • Script: dotplot_public_script.py
  • Input: dotplot_public.xlsx
  • Outputs:
    • public_ht_by_position.png
    • public_ht_by_position.pdf
    • public_ht_by_position_tidy.csv

Heatmaps

Underhill Data

  • Script: heatmap_underhill_script.py
  • Input: underhill_sig.ht_only.xlsx
  • Output: events_matrix_proportions.csv

Public Data

  • Script: heatmap_public_script.py
  • Input: public_sig.ht_only.xlsx
  • Outputs:
    • weighted_matrix_proportions.csv
    • events_heatmap_proportions.png
    • weighted_heatmap_proportions.png
    • public_events_heatmap_proportions_v2.png

About

2025 Thesis: Does phase variation play a role in the genetic diversity of mycobacteria that causes tuberculosis in animals?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published