Apache Accumulo Spark Example

Requirements

Accumulo 2.0+
Hadoop YARN installed & HADOOP_CONF_DIR set in environment
Spark installed & SPARK_HOME set in environment

Spark example

The CopyPlus5K example will create an Accumulo table called spark_example_input and write 100 key/value entries into Accumulo with the values 0..99. It then launches a Spark application that does following:

Read data from spark_example_input table using AccumuloInputFormat
Add 5000 to each value
Write the data to a new Accumulo table (called spark_example_output) using one of two methods.
1. Bulk import - Write data to an RFile in HDFS using AccumuloFileOutputFormat and bulk import to Accumulo table
2. Batchwriter - Creates a BatchWriter in Spark code to write to the table.

This application can be run using the command:

./run.sh batch /path/to/accumulo-client.properties

Change batch to bulk to use Bulk import method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Apache Accumulo Spark Example

Requirements

Spark example

Files

README.md

Latest commit

History

README.md

File metadata and controls

Apache Accumulo Spark Example

Requirements

Spark example