Skip to content

An example to demonstrate bulkloading into HBase using Spark streaming

Notifications You must be signed in to change notification settings

ryantotti/hbase-streaming-bulkload

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Example of bulk loading HBase from a Spark stream

This code demonstrates streaming bulk load into HBase 1.2.x using Spark 1.6.x. A Netcat process is used to simulate a stream of data.

To run this example:

  1. Start an HBase server in standalone mode by downloading version 1.2.5 and then issuing ./bin/start-hbase.sh. Verify it works by checking the HBase Master UI console is visible on http://localhost:16010/master-status.

  2. Create the table in HBase:

$ ./bin/hbase shell
hbase(main):034:0> create 'streamingtest', {NAME => 'c', VERSIONS => 1} 
  1. The example uses a socket to provide the stream. On a linux machine, start Netcat using $ nc -lk 7001 before running the example. On the Netcat terminal you can then write sample messages to be read from the stream.

  2. The project pom contains a profile named ide which allows you to run the example in Intellij by simply enabling the ide profile and then right-click Bulkload and Run Bulkload (this assumes an HBase cluster is running locally).

Each entry writen in the Netcat console should result in a new row in HBase. The rows are written in bulk every 10 seconds.

About

An example to demonstrate bulkloading into HBase using Spark streaming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%