- Accumulo 2.0+
- Hadoop YARN installed &
HADOOP_CONF_DIR
set in environment - Spark installed &
SPARK_HOME
set in environment
The CopyPlus5K example will create an Accumulo table called spark_example_input
and write 100 key/value entries into Accumulo with the values 0..99
. It then launches
a Spark application that does following:
- Read data from
spark_example_input
table usingAccumuloInputFormat
- Add 5000 to each value
- Write the data to a new Accumulo table (called
spark_example_output
) using one of two methods.- Bulk import - Write data to an RFile in HDFS using
AccumuloFileOutputFormat
and bulk import to Accumulo table - Batchwriter - Creates a
BatchWriter
in Spark code to write to the table.
- Bulk import - Write data to an RFile in HDFS using
This application can be run using the command:
./run.sh batch /path/to/accumulo-client.properties
Change batch
to bulk
to use Bulk import method.