HBase is one the hottest Non-Relational Databases on Hadoop right now! HBase is a NoSQL built to work on top the Hadoop Distrubuted File System (HDFS). HDFS is built on the concept of Schema-on-Read where a schema is applied to the data on read. HBase provides the ability to apply a light schema to data in HDFS. Learn how to create a table in HBase in this post.
What is HBase?
HBase is a NoSQL distributed and scalable database built on top of Hadoop. It provides the read/write capability needed in HDFS. Where HDFS is great as a file system, HBase enables you to index data in HDFS to speed up quicker reads. Just like the other projects/frameworks in the Hadoop ecosystem, HBase is open source and written in Java. HBase boasts some large deployments with Netflix being one of the largest.
For our systems based on Hadoop, Apache HBase is a convenient, high-performance column-oriented distributed database solution. With its dynamic partitioning model, HBase makes it really easy to grow your cluster and re-distribute load across nodes at runtime, which is great for managing our ever-growing data volume needs and avoiding hot spots. Built-in support for data compression, range queries spanning multiple nodes, and even native support for distributed counters make it an attractive alternative for many of our use cases. HBase’s strong consistency model can also be handy, although it comes with some availability trade offs. Perhaps the biggest utility comes from being able to combine real-time HBase queries with batch map-reduce Hadoop jobs, using HDFS as a shared storage platform. — Netflix TechBlog
Let’s get familiar with HBase by creating a table to store and index data HBase.
Project Asteroid Warning
For our example let’s walk through creating a table for a project called Asteroid Warning. Asteroid Warning is a project for indexing all the Asteroids in Solar System and someday the Galaxy! Every Space Probe or Space Drone is equipped with a sensor to gather location information of Asteroids. Once the data is collected and indexed the information is shared with other probes and Space Missions to allow them to avoid those objects.
Our role in this project is to setup the table to index the data collected. We will be using the HBase Shell installed from the Hortonworks Data Platfrom (HDP). The commands will work in any deployed HBase environment.
HBase Data Collected
Project Asteroid Warning will collect the location of the object, date time of the sighting, size of object, and probe’s unique id number. Doing this in HBase we will create two column names one for the objects and one for the crafts. Each object column will contain size and location(family columns). For this example we will use the Equatorial Coordinate System for our location data. Denoted by x, y, z. Our craft column will contain our probe’s unique id number. We will let HBase handle our datetime since our writes are going to be in real-time. As the object is logged it will be inserted into HBase.
|Column Name||Family Column|
How to Create a Table in HBase
Follow along to create a table for the Asteroid Warning Project.
Open HBase Shell
First thing we need to do is log into the HBase Shell.
Setting HBase Table
From the HBase shell let’s use the create command to make the
hbase> create 'asteroids', 'object', 'craft'
Putting Out Data into HBase
// Start Populating
hbase> put 'asteroids', 'row1', 'object:location', '124212'
hbase> put 'asteroids', 'row1', 'object:size', '9863'
hbase> put 'asteroids', 'row1', 'craft:pid', '2983212'
Now that we’ve mastered how to create tables in HBase let’s learn how to query and review our HBase tables.