You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/asciidocs/odgi_sort.adoc
+21-62Lines changed: 21 additions & 62 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,15 +26,13 @@ determine the node order:
26
26
next node in the prior graph order that has not been sorted, yet. The cycle breaking algorithm applies a DFS sort until
27
27
a cycle is found. We break and start a new DFS sort phase from where we stopped.
28
28
- A random sort: The graph is randomly sorted. The node order is randomly shuffled from http://www.cplusplus.com/reference/random/mt19937/[Mersenne Twister pseudo-random] generated numbers.
29
-
- A sparse matrix mondriaan sort: We can partition a hypergraph with integer weights and uniform hyperedge costs using the http://www.staff.science.uu.nl/~bisse101/Mondriaan/[Mondriaan] partitioner.
30
29
- A 1D linear SGD sort: Odgi implements a 1D linear, variation graph adjusted, multi-threaded version of the https://arxiv.org/abs/1710.04626[Graph Drawing
31
30
by Stochastic Gradient Descent] algorithm. The force-directed graph drawing algorithm minimizes the graph's energy function
32
31
or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time.
33
-
- A path guided, 1D linear SGD sort: The major bottleneck of the 1D linear SGD sort is that the memory allocation is quadratic
34
-
in number of nodes. So it does not scale for large graphs. This issue is tackled by the path guided, 1D linear SGD sort.
35
-
Instead of precalculating all terms, it can use a path index to pick the terms to move stochastically. If ran with 1 thread only,
36
-
the resulting order of the graph is deterministic. Ony can vary the seed.
37
-
- An eades algorithmic sort: Use http://www.it.usyd.edu.au/~pead6616/old_spring_paper.pdf[Peter Eades' heuristic for graph drawing].
32
+
- A path guided, 1D linear SGD sort: Odgi implements a 1D linear, variation graph adjusted, multi-threaded version of the https://arxiv.org/abs/1710.04626[Graph Drawing
33
+
by Stochastic Gradient Descent] algorithm. The force-directed graph drawing algorithm minimizes the graph's energy function
34
+
or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time. The path index is used to pick the terms to move stochastically. If ran with 1 thread only,
35
+
the resulting order of the graph is deterministic. The seed is adjustable.
38
36
39
37
Sorting the paths in a graph my refine the sorting process. For the users' convenience, it is possible to specify a whole
40
38
pipeline of sorts within one parameter.
@@ -80,62 +78,19 @@ pipeline of sorts within one parameter.
80
78
*-r, --random*::
81
79
Randomly sort the graph.
82
80
83
-
=== Mondriaan Sort
84
-
85
-
*-m, --mondriaan*::
86
-
Use the sparse matrix diagonalization to sort the graph.
87
-
88
-
*-N, --mondriaan-n-parts*=_N_::
89
-
Number of partitions for the mondriaan sort.
90
-
91
-
*-E, --mondriaan-epsilon*=_N_::
92
-
Set the epsilon parameter for the mondriaan sort.
93
-
94
-
*-W, --mondriaan-path-weight*::
95
-
Weight the mondriaan input matrix by the path coverage of edges.
96
-
97
-
=== 1D Linear SGD Sort
98
-
99
-
*-S, --linear-sgd*::
100
-
Apply 1D linear SGD algorithm to sort the graph.
101
-
102
-
*-O, --sgd-bandwidth*=_sgd-bandwidth_::
103
-
Bandwidth of linear SGD model. The default value is _1000_.
104
-
105
-
*-Q, --sgd-sampling-rate*=_sgd-sampling-rate_::
106
-
Sample pairs of nodes with probability distance between them divided by the sampling rate. The default value is _20_.
107
-
108
-
*-K, --sgd-use-paths*::
109
-
Use the paths to structure the distances between nodes in SGD.
110
-
111
-
*-T, --sgd-iter-max*=_sgd_iter-max_::
112
-
The maximum number of iterations for the linear SGD model. The default value is _30_.
113
-
114
-
*-V, --sgd-eps*=_sgd-eps_::
115
-
The final learning rate for the linear SGD model. The default value is _0.01_.
116
-
117
-
*-C, --sgd-delta*=_sgd-delta_::
118
-
The threshold of the maximum node displacement, approximately in base pairs, at which to stop SGD.
119
-
120
81
=== Path Guided 1D Linear SGD Sort
121
82
122
83
*-Y, --path-sgd*::
123
84
Apply path guided 1D linear SGD algorithm to organize the graph.
124
85
125
-
*-J, --path-sgd-sample-from-paths*::
126
-
Instead of sampling the first node from all nodes we sample from all nucleotide positions of the paths. Default value is _FALSE_.
127
-
128
-
*-l, --path-sgd-sample-from-path-steps*::
129
-
Instead of sampling the first node from all nodes we sample from all path steps of the paths. Default value is _FALSE_.
130
-
131
-
*-I, --path-sgd-deterministic*::
132
-
Run the path guided 1D linear SGD in deterministic mode. Will automatically set the number of threads to 1, multithreading is not supported in this mode. Default value is _FALSE_.
86
+
*-X, --path-index*=_FILE_::
87
+
Load the path index from this _FILE_.
133
88
134
89
*-f, --path-sgd-use-paths*=FILE::
135
90
Specify a line separated list of paths to sample from for the on the fly term generation process in the path guided linear 1D SGD. The default value are _all paths_.
136
91
137
92
*-G, --path-sgd-min-term-updates-paths*=_N_::
138
-
The minimum number of terms to be updated before a new path guided linear 1D SGD iteration with adjusted learning rate eta starts, expressed as a multiple of total path length. The default value is _0.1_. Can be overwritten by _-U, -path-sgd-min-term-updates-nodes=N_.
93
+
The minimum number of terms to be updated before a new path guided linear 1D SGD iteration with adjusted learning rate eta starts, expressed as a multiple of total path steps. The default value is _1.0_. Can be overwritten by _-U, -path-sgd-min-term-updates-nodes=N_.
139
94
140
95
*-U, --path-sgd-min-term-updates-nodes*=_N_::
141
96
The minimum number of terms to be updated before a new path guided linear 1D SGD iteration with adjusted learning rate eta starts, expressed as a multiple of the number of nodes. Per default, the argument is not set. The default of _-G, path-sgd-min-term-updates-paths=N_ is used).
@@ -147,19 +102,28 @@ pipeline of sorts within one parameter.
147
102
The final learning rate for path guided linear 1D SGD model. The default value is _0.01_.
148
103
149
104
*-v, --path-sgd-eta-max*=_N_::
150
-
The first and maximum learning rate for path guided linear 1D SGD model. The default value is _number of nodes in the graph_.
105
+
The first and maximum learning rate for path guided linear 1D SGD model. The default value is _squared steps of longest path in graph_.
151
106
152
107
*-a, --path-sgd-zipf-theta*=_N_::
153
108
The theta value for the Zipfian distribution which is used as the sampling method for the second node of one term in the path guided linear 1D SGD model. The default value is _0.99_.
154
109
155
110
*-x, --path-sgd-iter-max*=_N_::
156
-
The maximum number of iterations for path guided linear 1D SGD model. The default value is 30.
111
+
The maximum number of iterations for path guided linear 1D SGD model. The default value is _30_.
157
112
158
-
*-F, --iteration-max-learning-rate::
159
-
The iteration where the learning rate is max for path guided linear 1D SGD model. The default value is 0.
113
+
*-F, --iteration-max-learning-rate*=_N_::
114
+
The iteration where the learning rate is max for path guided linear 1D SGD model. The default value is _0_.
160
115
161
116
*-k, --path-sgd-zipf-space*=_N_::
162
-
The maximum space size of the Zipfian distribution which is used as the sampling method for the second node of one term in the path guided linear 1D SGD model. The default value is the _maximum path lengths_.
117
+
The maximum space size of the Zipfian distribution which is used as the sampling method for the second node of one term in the path guided linear 1D SGD model. The default value is the _longest path length_.
118
+
119
+
*-I, --path-sgd-zipf-space-max*=_N_::
120
+
The maximum space size of the Zipfian distribution beyond which quantization occurs. Default value is _100_.
0 commit comments