Skip to content

disq-bio/disq-examples

Repository files navigation

disq-examples

Examples for the Disq library.

Build Status License: MIT

Building disq-examples

Install

To build

$ mvn install

Java examples

The disq-examples-java module contains Disq examples implemented in Java. While not all that exciting on their own, they provide a reasonable template for building analyses in Java using the Disq library.

Each is implemented as a final class with a public static void main(final String[] args) method. After validating command line arguments, the Spark context is configured and instantiated

SparkConf conf = new SparkConf()
  .setAppName("Java Disq Example")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .set("spark.kryo.registrator", "org.disq_bio.disq.serializer.DisqKryoRegistrator")
  .set("spark.kryo.referenceTracking", "true");

JavaSparkContext jsc = new JavaSparkContext(new SparkContext(conf));

Then a JavaRDD is read using the Disq APIs

HtsjdkReadsRddStorage htsjdkReadsRddStorage = HtsjdkReadsRddStorage.makeDefault(jsc);
HtsjdkReadsRdd htsjdkReadsRdd = htsjdkReadsRddStorage.read(filePath);

JavaRDD<SAMRecord> reads = htsjdkReadsRdd.getReads();

and analysis is performed via the Spark JavaRDD APIs.

Running Java examples

These examples can be run on the command line via spark-submit, e.g.

$ spark-submit \
    --packages org.disq-bio:disq:${disq.version} \
    --class org.disq_bio.disq.examples.java.JavaCountAlignments \
    disq-examples-java_2.12-${disq.version}.jar \
    sample.bam

(unmapped,30)
(1,4887)

Scala examples

The disq-examples-scala module contains Disq examples implemented in Scala. While not all that exciting on their own, they provide a reasonable template for building analyses in Scala using the Disq library.

Each is implemented as an object with a def main(args: Array[String]) method. After validating command line arguments, the Spark context is configured and instantiated

val conf = new SparkConf()
  .setAppName("Scala Disq Example")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .set("spark.kryo.registrator", "org.disq_bio.disq.serializer.DisqKryoRegistrator")
  .set("spark.kryo.referenceTracking", "true")

val sc = new SparkContext(conf)

Then an RDD is read using the Disq APIs

val htsjdkReadsRddStorage: HtsjdkReadsRddStorage = HtsjdkReadsRddStorage.makeDefault(sc);
val htsjdkReadsRdd: HtsjdkReadsRdd = htsjdkReadsRddStorage.read(filePath);

val reads: RDD[SAMRecord] = htsjdkReadsRdd.getReads();

and analysis is performed via the Spark RDD APIs.

Running Scala examples

These examples can be run on the command line via spark-submit, e.g.

$ spark-submit \
    --packages org.disq-bio:disq:${disq.version} \
    --class org.disq_bio.disq.examples.scala.ScalaCountAlignments \
    disq-examples-scala_2.12-${disq.version}.jar \
    sample.bam

(unmapped,30)
(1,4887)