GitHub - ryantotti/hbase-streaming-bulkload: An example to demonstrate bulkloading into HBase using Spark streaming

This code demonstrates streaming bulk load into HBase 1.2.x using Spark 1.6.x. A Netcat process is used to simulate a stream of data.

To run this example:

Start an HBase server in standalone mode by downloading version 1.2.5 and then issuing ./bin/start-hbase.sh. Verify it works by checking the HBase Master UI console is visible on http://localhost:16010/master-status.
Create the table in HBase:

$ ./bin/hbase shell
hbase(main):034:0> create 'streamingtest', {NAME => 'c', VERSIONS => 1}

The example uses a socket to provide the stream. On a linux machine, start Netcat using $ nc -lk 7001 before running the example. On the Netcat terminal you can then write sample messages to be read from the stream.
The project pom contains a profile named ide which allows you to run the example in Intellij by simply enabling the ide profile and then right-click Bulkload and Run Bulkload (this assumes an HBase cluster is running locally).

Each entry writen in the Netcat console should result in a new row in HBase. The rows are written in bulk every 10 seconds.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/main/scala/com/opencore/sample		src/main/scala/com/opencore/sample
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback