Pangenome graphs built from raw sets of alignments may have complex local structures generated by common patterns of genome variation. These local nonlinearities can introduce difficulty in downstream analyses, visualization, and interpretation of variation graphs.
smoothxg finds blocks of paths that are collinear within a variation graph.
It applies partial order alignment to each block, yielding an acyclic variation graph.
Then, to yield a "smoothed" graph, it walks the original paths to lace these subgraphs together.
The resulting graph only contains cyclic or inverting structures larger than the chosen block size, and is otherwise manifold linear.
In addition to providing a linear structure to the graph, smoothxg can be used to extract the consensus pangenome graph by applying the heaviest bundle algorithm to each chain.
To find blocks, smoothxg applies a greedy algorithm that assumes that the graph nodes are sorted according to their occurence in the graph's embedded paths.
The path-guided stochastic gradient descent based 1D sort implemented in odgi sort -Y is designed to provide this kind of sort.
This sort is similar to a 1-dimensional graph layout.
After finding blocks
smoothxg can operate an any input variation graph in GFA format.
The graph must have sequences represented as paths in P records, while the topology of the graph is in S and L records.
Path names should be unique.
seqwish is a standard way to make such a graph.
smoothxg uses cmake to build itself and its dependencies. At least GCC version 9.3.0 is required for compilation.
You can check your version via:
gcc --version
g++ --version
Clone the smoothxg git repository and build with:
sudo apt-get update && sudo apt-get install -y libatomic-ops-dev libgsl-dev zlib1g-dev libzstd-dev libjemalloc-dev
git clone --recursive https://github.com/pangenome/smoothxg.git
cd smoothxg
cmake -H. -Bbuild && cmake --build build -- -j 4
To optimize for architecture
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j 16 VERBOSE=1 && ctest . --verbose
libzstd-dev must be of version 1.4 or higher.
Run tests:
ctest . --verbose
Note that smoothxg depends on git submodules:
git submodule update --init --recursive
In your source dir make sure git submodules are up-to-date and follow the instructions in guix.scm.
If you need to avoid machine-specific optimizations, use the CMAKE_BUILD_TYPE=Generic build type:
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Generic && cmake --build build -- -j 3To build for a specific architecture you can use EXTRA_FLAGS
cmake -DCMAKE_BUILD_TYPE=Release -DEXTRA_FLAGS="-Ofast -march=znver1" .. && make -j 16 VERBOSE=1And to make a static build add the -DBUILD_STATIC=ON switch.
smoothxg recipes for Bioconda are available at https://anaconda.org/bioconda/smoothxg.
To install the latest version using Conda execute:
conda install -c bioconda smoothxgFirst, clone the guix-genomics repository:
git clone https://github.com/ekg/guix-genomicsAnd install the smoothxg package to your default GUIX environment:
GUIX_PACKAGE_PATH=. guix package -i smoothxgNow smoothxg is available as a global binary installation.
Add the following to your ~/.config/guix/channels.scm:
(cons*
(channel
(name 'guix-genomics)
(url "https://github.com/ekg/guix-genomics.git")
(branch "master"))
%default-channels)First, pull all the packages, then install smoothxg to your default GUIX environment:
guix pull
guix package -i smoothxgIf you want to build an environment only consisting of the smoothxg binary, you can do:
guix environment --ad-hoc smoothxgFor more details about how to handle Guix channels, go to https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git.
To make the -S/--write-split-block-fastas and -B/--write-poa-block-fastas options available, and emit a table
with POA block statistics, add the -DPOA_DEBUG=ON option:
cmake -H. -Bbuild -D CMAKE_BUILD_TYPE=Release -DPOA_DEBUG=ON && cmake --build build -- -j 3