-
Notifications
You must be signed in to change notification settings - Fork 262
Hands on Kite Lab 3: Create a Dataset in HBase
Joey Echeverria edited this page Jul 30, 2014
·
1 revision
One of the benefits of Kite is the ability to use the same schema and APIs you use to work with HDFS to load data into HBase. Since HBase stores data by key, we will first define the keys with a partition configuration that defines the fields that will become the key:
./dataset partition-config userId:copy movieId:copy -s rating.avsc -o rating-hbase-part.json
We also need to map the fields in the data to the HBase row key and to columns in the table:
./dataset mapping-config userId:key movieId:key rating:f timestamp:f -s rating.avsc -p rating-hbase-part.json -o rating-mapping.json
Now we can create the HBase-backed dataset using the schema, partition configuration, and mapping configuration:
./dataset create dataset:hbase:localhost.localdomain/ratings -s rating.avsc -p rating-hbase-part.json -m rating-mapping.json
Once the dataset is created, we can import the same data we imported into HDFS into HBase instead:
./dataset csv-import hdfs://localhost.localdomain/user/joey/u.data dataset:hbase:localhost.localdomain/ratings
Finally, we can use Kite's view URIs to grab a single record from HBase based on the fields that make up the key:
./dataset show "view:hbase:localhost.localdomain/ratings?userId=196&movieId=242"