SparkAligner

SparkAligner is generalized version of SparkBWA with support for modules. These modules can be used to add support for other aligners.

The following aligners are currently supported:

bwa

Usage

spark-submit                                 \
--class com.github.sparkaligner.SparkAligner \
sparkaligner.jar                             \
<name of aligner to use>                     \
[<aligner specific options>]

Ex.

spark-submit                                 \
--class com.github.sparkaligner.SparkAligner \
sparkaligner.jar                             \
bwa                                          \
-algorithm mem                               \
-R /data/hg19/hg19.fasta                     \
-partitions 2                                \
-I /data/input/datasets

Running the provided Docker example

The docker image can be built using

docker build --no-cache -t <name of Docker image> .

It can also be found here. This docker image also downloads a test dataset (the Lambda phage dataset) from f.128.no).

docker run                                                                   \
-it                                                                          \
-v <path to data folder>:<path to mount the data folder inside to container> \
paalka/spark-aligner
<regular spark-aligner arguments here>

In order to keep the SAM files, you need to mount the data directory to the container, ex:

docker run -it                             \
-v <path to test data>:/test_data          \
paalka/spark-aligner bwa                   \
-algorithm mem                             \
-R /data/reference/lambda_virus.fa         \
-I /test_data/<test_data_folder>           \
-partitions 2 -bwaArgs "-t 4"

Building

Make sure to clone the project using git clone --recursive, as it uses submodules. The JAR can be built by using make.

Adding new modules

The folder aligners contains the code for each module. New aligners are required to extends the abstract class BaseAligner, which performs most of the Spark related work. This means that in order to add a new aligner, you only need to specify how to process the arguments for the aligner, and manage how the aligner will be run.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
lib		lib
src/main/java/com/github/sparkaligner		src/main/java/com/github/sparkaligner
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkAligner

Usage

Running the provided Docker example

Building

Adding new modules

About

Releases

Packages

Contributors 4

Languages

License

paalka/SparkAligner

Folders and files

Latest commit

History

Repository files navigation

SparkAligner

Usage

Running the provided Docker example

Building

Adding new modules

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages