Examples for the Disq library.
Install
- JDK 1.8 or later, http://openjdk.java.net
- Apache Maven 3.3.9 or later, http://maven.apache.org
To build
$ mvn install
The disq-examples-java
module contains Disq examples implemented in Java. While not all that exciting on
their own, they provide a reasonable template for building analyses in Java using the Disq library.
Each is implemented as a final class with a public static void main(final String[] args)
method. After
validating command line arguments, the Spark context is configured and instantiated
SparkConf conf = new SparkConf()
.setAppName("Java Disq Example")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrator", "org.disq_bio.disq.serializer.DisqKryoRegistrator")
.set("spark.kryo.referenceTracking", "true");
JavaSparkContext jsc = new JavaSparkContext(new SparkContext(conf));
Then a JavaRDD
is read using the
Disq APIs
HtsjdkReadsRddStorage htsjdkReadsRddStorage = HtsjdkReadsRddStorage.makeDefault(jsc);
HtsjdkReadsRdd htsjdkReadsRdd = htsjdkReadsRddStorage.read(filePath);
JavaRDD<SAMRecord> reads = htsjdkReadsRdd.getReads();
and analysis is performed via the Spark JavaRDD APIs.
These examples can be run on the command line via spark-submit
, e.g.
$ spark-submit \
--packages org.disq-bio:disq:${disq.version} \
--class org.disq_bio.disq.examples.java.JavaCountAlignments \
disq-examples-java_2.12-${disq.version}.jar \
sample.bam
(unmapped,30)
(1,4887)
The disq-examples-scala
module contains Disq examples implemented in Scala. While not all that exciting on
their own, they provide a reasonable template for building analyses in Scala using the Disq library.
Each is implemented as an object with a def main(args: Array[String])
method. After
validating command line arguments, the Spark context
is configured and instantiated
val conf = new SparkConf()
.setAppName("Scala Disq Example")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrator", "org.disq_bio.disq.serializer.DisqKryoRegistrator")
.set("spark.kryo.referenceTracking", "true")
val sc = new SparkContext(conf)
Then an RDD
is read using the
Disq APIs
val htsjdkReadsRddStorage: HtsjdkReadsRddStorage = HtsjdkReadsRddStorage.makeDefault(sc);
val htsjdkReadsRdd: HtsjdkReadsRdd = htsjdkReadsRddStorage.read(filePath);
val reads: RDD[SAMRecord] = htsjdkReadsRdd.getReads();
and analysis is performed via the Spark RDD APIs.
These examples can be run on the command line via spark-submit
, e.g.
$ spark-submit \
--packages org.disq-bio:disq:${disq.version} \
--class org.disq_bio.disq.examples.scala.ScalaCountAlignments \
disq-examples-scala_2.12-${disq.version}.jar \
sample.bam
(unmapped,30)
(1,4887)