Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting subsets of sequences #211

Open
npcooley opened this issue Feb 3, 2025 · 1 comment
Open

Plotting subsets of sequences #211

npcooley opened this issue Feb 3, 2025 · 1 comment

Comments

@npcooley
Copy link

npcooley commented Feb 3, 2025

I can't seem to find any examples that describe how to do the task that i'm interested in, so I'm just creating this issue:

If i have a scenario where i have two genomes that are both 6M bp long, and I want to plot region subsets against each other, i.e. 300k-350k in genome 1, and 1.5M-1.55M in genome 2. How do i use this package to plot this kind of comparison?

@thackl
Copy link
Owner

thackl commented Feb 3, 2025

Hi Nicholas,

you can zoom in on explicit subsets, either by adding a start and end column the sequence table for gggenomes(seqs=...) or by given a dataframe with seq_id, start, end to focus(.loci=...).

With focus you can also dynamically zoom in on regions, e.g. by specifying a selector for features. Something like focus(.track_id=genes, str_detect(name, "dnaB") ) will give all regions around "dnaB" genes. Or focus(.track_id=links) will zoom in on all regions that share similarity with other genomes.

Does that answer you question?

library(tidyverse)
library(gggenomes)

# Two 6Mbp genomes with one chromosome each
seqs = tibble(
  seq_id = c("A", "B"),
  bin_id = seq_id,
  length = 6e6)

# some links between different parts of genomes
links = tibble(
  seq_id = "A",
  start = 1e5,
  end = 1.1e5,
  seq_id2 = "B",
  start2 = 3e5,
  end2=3.15e5)

# full genomes, no zoom
p1 <- gggenomes(seqs=seqs, links=links) +
  geom_seq() +
  geom_seq_label() +
  geom_link()
 
# zoom in by added start/end to sequence track
seqs_start_end <- seqs |>
  mutate(start = c(0.8e5, 2.8e5), end = c(1.3e5, 3.3e5))
p2 <- gggenomes(seqs=seqs_start_end, links=links) +
  geom_seq() +
  geom_seq_label() +
  geom_link()

# zoom in by focus(.loci=)
p3 <- gggenomes(seqs=seqs, links=links) |> 
  focus(.loci=tibble(seq_id = c("A", "B"), start=c(0.8e5, 2.8e5), end = c(1.3e5, 3.3e5))) +
  geom_seq() +
  geom_seq_label() +
  geom_link()

# zoom in dynamically with focus()
p4 <- gggenomes(seqs=seqs, links=links) |> 
  focus(
    .track_id=links, # zoom in on any links
    .locus_id = str_glue("{seq_id}:{start}-{end}") # add start/end to contig name
  ) +
  geom_seq() +
  geom_link() +
  geom_seq_label()


library(patchwork)

p1 + p2 + p3 + p4

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants