Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Motif Detection in DragoNN Simulated Data #67

Open
GoktugGuvercin opened this issue Sep 1, 2022 · 0 comments
Open

Multiple Motif Detection in DragoNN Simulated Data #67

GoktugGuvercin opened this issue Sep 1, 2022 · 0 comments

Comments

@GoktugGuvercin
Copy link

Hello;

While I was working on transcription factor binding sites and motif detection, I noticed your DragoNN toolkit and Github profile. It is very informative and useful. I aim to develop a deep learning model for multiple motif recognition. At this point, I intend to use your simulation data accessible via the following link: https://github.com/kundajelab/dragonn/blob/master/paper_supplement/simulation_data/GC_fraction0.4max_num_motifs3min_num_motifs0motif_names%5B'CTCF_known1'%2C%20'ZNF143_known2'%2C%20'SIX5_known1'%5Dnum_seqs20000seq_length500.npz

As far as I understand, the sequences in this dataset consists of 500 nucleotides and total percentage of guanine-cytosine in the sequences is approximately 0.4 However, I am confused at the number of motifs in the sequences. Max and minimum number of motifs are set to 3 and 0 respectively.

What does it exactly mean ?

Can 3 instances of each motif exist in a positive sequence ? In other words, the max number of motifs is for each motif or sum of all three motifs ? In first case, up to 3 instances of each motif can exist in a positive sequence. In second case, only 1 instance of each motif can be accommodated in a positive sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant