LOY (Loss of Y Chromosome) Mapping Pipeline

This repository contains scripts and tools to quantify and analyze Loss of Y (LOY) in genomic datasets, primarily from WGS, WES, and RNA-seq data. It is designed for CCLE datasets and other human genomic data sources.

Overview

Loss of Y (LOY) is a common somatic chromosomal aberration with implications in aging and cancer. This pipeline provides tools to:

Extract reads mapped to chromosome Y.
Quantify LOY across samples and tissues.
Integrate LOY with RNA-seq for differential expression analysis.
Generate visualizations for LOY patterns.

Scripts

`ExtractYpositions.sh`

Extracts all reads mapped to chromosome Y from a directory of BAM files and records their positions.

Usage:

bash ExtractYpositions.sh /path/to/bam/directory /path/to/output/file.txt

`AnalyzeYPositions.R`

Analyzes chromosome Y read positions to identify sparse regions and low-density genes.
Generates histograms, cumulative distributions, and visualizations of read density.

Dependencies:

R packages: dplyr, tidyr, ggplot2

Key outputs:

ChromosomeDensityHist.png – Histogram of read density across chromosome Y
low_density_2000genes.txt – Genes overlapping low-density regions (below threshold)
ChromosomeCDF.png – Cumulative distribution function of read densities
orderedSamples.txt – Sample IDs ordered by number of sparse regions
ChromosomePosition.png – Scatter plot of reads per sample, ordered by sparsity
ChromosomeBinSummary.png – Distribution of reads across chromosome Y bins

Usage:

Rscript AnalyzeYPositions.R

`CompareDNAtoRNA.R`

Compares DNA sequencing read positions on chromosome Y with RNA-seq expression data for Y-linked genes.
Generates a visualization of expressed Y-linked genes across samples ordered by DNA sparsity.

Dependencies:

R packages: dplyr, tidyr, ggplot2
Input files:
- RNAseqdataClean.txt – Processed RNA-seq expression matrix (GCT-like format)
- genes_chrY_positions.txt – Start and end positions of Y-linked genes
- orderedSamples.txt – Sample order generated by AnalyzeYPositions.R
- Num_mapped_reads.csv – Mapping between Run IDs and cell line names

Key output:

Gene_vs_Sample_BarsV2.png – Bar plot showing expression presence of Y-linked genes across samples

Usage:

Rscript CompareDNAtoRNA.R

`CallVariants.sh`

Performs variant calling on BAM files using bcftools and outputs compressed, indexed VCF files.

Dependencies:

Tools: bcftools (with mpileup, call, and index subcommands)
Input files:
- *.bam – BAM files to be processed
- Homo_sapiens_assembly19.fixed.fasta – Reference genome (update path as needed)

Outputs:

VCF_Files/ – Directory containing compressed and indexed VCFs (.vcf.gz and .csi)

Usage:

bash CallVariants.sh

`makeMatrixEQTL.R`

Runs eQTL analysis by integrating SNP genotypes from VCF files with RNA-seq expression data using the MatrixEQTL R package.

Dependencies:

R packages: VariantAnnotation, dplyr, MatrixEQTL, data.table, Matrix, biomaRt
Input files:
- *.vcf.gz – Variant call files (compressed and indexed with bcftools)
- Num_mapped_reads.csv – Maps sample IDs (Run) to cell lines
- CCLE_RNAseq_genes_rpkm_20180929.gct – Gene expression data (RNA-seq, RPKM values)
- snps_positions.txt – SNP positions (generated during runtime)
- gene_positions.txt – Gene positions (fetched via biomaRt)

Outputs:

snps_positions.txt – SNP positions extracted from VCFs
gene_positions.txt – Gene coordinates for tested genes
output_matrixeqtl_results.txt – eQTL associations (with p-values and FDR correction)

Usage:

Rscript makeMatrixEQTL.R

Data

This repository contains data files required for the LOY mapping pipeline.

Required files

gene_chrY_positions.txt – Contains the start and end positions of genes on chromosome Y.
Num_mapped_reads.csv – Maps sequencing run IDs (Run) to their corresponding CCLE cell lines (cell_line).
- Example format:
```
Run,cell_line,num_mapped_reads
SRR8639150,EBC1_LUNG,379420
SRR8639204,CAKI1_KIDNEY,541050
SRR8639219,DMS53_LUNG,514931
```
- Only the Run and cell_line columns are required for analyses.
BAM files – High-coverage WGS/WES BAM files.
RNA-seq counts – Processed RNA-seq count files for integration with LOY analyses.

Full CCLE or other controlled-access datasets are not included due to data usage restrictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LOY (Loss of Y Chromosome) Mapping Pipeline

Table of Contents

Overview

Scripts

`ExtractYpositions.sh`

`AnalyzeYPositions.R`

`CompareDNAtoRNA.R`

`CallVariants.sh`

`makeMatrixEQTL.R`

Data

Required files

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AnalyzeYPositions.R		AnalyzeYPositions.R
CallVariants.sh		CallVariants.sh
CompareDNAtoRNA.R		CompareDNAtoRNA.R
ExtractYpositions.sh		ExtractYpositions.sh
README.md		README.md
genes_chrY_positions.txt		genes_chrY_positions.txt
makeMatrixEQTL.R		makeMatrixEQTL.R

tatonetti-lab/Mapping-LOY

Folders and files

Latest commit

History

Repository files navigation

LOY (Loss of Y Chromosome) Mapping Pipeline

Table of Contents

Overview

Scripts

ExtractYpositions.sh

AnalyzeYPositions.R

CompareDNAtoRNA.R

CallVariants.sh

makeMatrixEQTL.R

Data

Required files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`ExtractYpositions.sh`

`AnalyzeYPositions.R`

`CompareDNAtoRNA.R`

`CallVariants.sh`

`makeMatrixEQTL.R`

Packages