Skip to content

Hands on Kite Lab 3: Create a Dataset in HBase

Joey Echeverria edited this page Jul 30, 2014 · 1 revision

One of the benefits of Kite is the ability to use the same schema and APIs you use to work with HDFS to load data into HBase. Since HBase stores data by key, we will first define the keys with a partition configuration that defines the fields that will become the key:

./dataset partition-config userId:copy movieId:copy -s rating.avsc -o rating-hbase-part.json

We also need to map the fields in the data to the HBase row key and to columns in the table:

./dataset mapping-config userId:key movieId:key rating:f timestamp:f -s rating.avsc -p rating-hbase-part.json -o rating-mapping.json

Now we can create the HBase-backed dataset using the schema, partition configuration, and mapping configuration:

./dataset create dataset:hbase:localhost.localdomain/ratings -s rating.avsc -p rating-hbase-part.json -m rating-mapping.json

Once the dataset is created, we can import the same data we imported into HDFS into HBase instead:

./dataset csv-import hdfs://localhost.localdomain/user/joey/u.data dataset:hbase:localhost.localdomain/ratings

Finally, we can use Kite's view URIs to grab a single record from HBase based on the fields that make up the key:

./dataset show "view:hbase:localhost.localdomain/ratings?userId=196&movieId=242"
Clone this wiki locally