-
Notifications
You must be signed in to change notification settings - Fork 9
Using Graph Peak Caller without vg
Even though Graph Peak Caller was created to be used with vg, it is also possible to use with data created by any tools, as long as you are able to convert your data to something Graph Peak Caller can use. This guide give a quick overview of the two main datastructures used by Graph Peak Caller, Interval and Graph.
Graph Peak Caller uses a very simple Python object for representing the graph is does peak calling on. Using Python, you can very easily create such a graph:
from offsetbasedgraph import Graph
graph = Graph({1: Block(10), 2: Block(4), 3: Block(5)}, {1: [2], 2: [3]})
Graph basically takes two dicts. A dict of nodes (also called blocks) having certian sizes and a dict of edges, where each key points at a list of which the given block has edges to.
After creating a Graph by using this dict-format, we convert it to numpy in order to make it much smaller (in memory and on disk) and more efficient:
graph.convert_to_numpy_backend()
graph.to_file("graph.nobg")
You now have a graph file that can be used as input to Graph Peak Caller!
The alignments taken as input by Graph Peak Caller are simple Intervals specified by a start offset, end offset and a list of nodes (also called blocks):
from offsetbasedgraph import Interval
interval1 = Interval(3, 5, [1])
interval2 = Interval(3, 2, [1, 2])
interval1 is now a interval covering two base pairs, starting at offset 1 in node 1 and ending at (not including) offset 5 in the same node. interval2 starts at offset 1 in node 1, covers the rest of node 1 and goes all the way to offset 2 in node 2.
Intervals can go both directions at a node. For instance is Interval(5, 7, [-1])
the reversed version of interval1
above (given that node 1 has length 10). The start offset 5
is now counted from the right side of node 1, since -1
is the node used. By using negative node IDs, you can represent reverse alignments.
Intervals can be sent as an iterable or a list to IntervalCollection
and then written to file:
from offsetbasedgraph import IntervalCollection
intervals = IntervalCollection([interval1, interval2])
intervals.to_file("alignments.intervalcollection")
alignments.intervalcollection
can now be used as input to Graph Peak Caller.