Skip to content

Using Graph Peak Caller without vg

ivargr edited this page Mar 19, 2018 · 1 revision

Even though Graph Peak Caller was created to be used with vg, it is also possible to use with data created by any tools, as long as you are able to convert your data to something Graph Peak Caller can use. This guide give a quick overview of the two main datastructures used by Graph Peak Caller, Interval and Graph.

Graph

Graph Peak Caller uses a very simple Python object for representing the graph is does peak calling on. Using Python, you can very easily create such a graph:

from offsetbasedgraph import Graph
graph = Graph({1: Block(10), 2: Block(4), 3: Block(5)}, {1: [2], 2: [3]})

Graph basically takes two dicts. A dict of nodes (also called blocks) having certian sizes and a dict of edges, where each key points at a list of which the given block has edges to.

After creating a Graph by using this dict-format, we convert it to numpy in order to make it much smaller (in memory and on disk) and more efficient:

graph.convert_to_numpy_backend()
graph.to_file("graph.nobg")

You now have a graph file that can be used as input to Graph Peak Caller!

Interval

The alignments taken as input by Graph Peak Caller are simple Intervals specified by a start offset, end offset and a list of nodes (also called blocks):

from offsetbasedgraph import Interval
interval1 = Interval(3, 5, [1])
interval2 = Interval(3, 2, [1, 2])

interval1 is now a interval covering two base pairs, starting at offset 1 in node 1 and ending at (not including) offset 5 in the same node. interval2 starts at offset 1 in node 1, covers the rest of node 1 and goes all the way to offset 2 in node 2.

Intervals can go both directions at a node. For instance is Interval(5, 7, [-1]) the reversed version of interval1 above (given that node 1 has length 10). The start offset 5 is now counted from the right side of node 1, since -1 is the node used. By using negative node IDs, you can represent reverse alignments.

Intervals can be sent as an iterable or a list to IntervalCollection and then written to file:

from offsetbasedgraph import IntervalCollection
intervals = IntervalCollection([interval1, interval2])
intervals.to_file("alignments.intervalcollection")

alignments.intervalcollection can now be used as input to Graph Peak Caller.

Clone this wiki locally