The scripts provided in this repository are used to compute and characterize the spacing relationships of transcription factors.
Here is the overview of the method:
- Python (
- NumPy 1.15.4 (
- pandas 1.1.4 (
- Biopython 1.70 (
- SciPy 1.1.0 (
- Matplotlib 3.3.3 (
- Seaborn 0.11.0 (
- HOMER ( can find motifs given a peak file, a FASTA file for peak sequences, and a motif file. The recommended parameters are as below to filter for motifs passing a false positive rate <0.1% (--cutoff) and a location <50 bp from peak centers (-d 50):
python ../ENCODE_processed_files/CTCF_idr.fa CTCF --motif_path ../motifs/ --cutoff -d 50
To identify motifs and simultaneously separate peaks into those falling at repetitive and nonrepetitive DNA regions, please download the repeats annotations first and run
script by specifying --repeat
tar -zxvf hg38_repeats.tar.gz
python ../ENCODE_processed_files/CTCF_idr.fa CTCF --motif_path ../motifs/ --cutoff -d 50 --repeat hg38_repeats/hg38_repeats_merged.nodup.all.txt can take in two processed files from
for a pair of transcription factors and output results of spacing relationships. The basic usage is as below:
python ../ENCODE_processed_files/ GATA1 TAL1 --motif_path ../motifs/
If you use our findings or scripts, please cite our paper:
folder stores the PWM files in the JASPAR format used in the paper.
folder includes the processed data of this paper based on ENCODE ChIP-seq data:
- _idr.tsv -- ChIP-seq peaks in HOMER peak file format after running IDR
- _idr.fa -- sequences of ChIP-seq peaks in _idr.tsv
- _idr_cutoff.tsv -- ChIP-seq peaks that have been identified to have valid motifs
- _idr_cutoff_inmask.tsv -- Peaks in _idr_cutoff.tsv that fall into repetitive regions
- _idr_cutoff_masked.tsv -- Peaks in _idr_cutoff.tsv that fall into nonrepetitive regions
If you enconter a problem when using the scripts, you can
- post an issue on Issue section
- or email Zeyang Shen by [email protected]
This project is licensed under GNU GPL v3
The scripts were developed primarily by Zeyang Shen and Rick Zhenzhi Li. Supervision for the project was provided by Christopher K. Glass.