Learning HBASE data model and important shell commands

Why we need to work with HBASE

HDFS is good for sequential data access, but it lacks the random read/write capability. HBase runs on top of the Hadoop File System and provides read and write access.

It is extremely fault-tolerant for storage of sparse data.

Data Model in HBase

The different components of Apache HBase data model are tables Rows, Column Families, Columns, Cells and Versions.


Hbase tables are made up of multiple rows stored according to their respective row keys in the table


Each rows has a row key and corresponding to it you can have one or multiple column families/columns. 

Design row key in such a way that, related entities should be stored in adjacent rows to increase read efficacy.

This helps to avoid hotspotting on a particular node which basically means that most of the read and write operations are not using a single region server. Make sure that data that needs to be fetched together is also storedd together in the in the data nodes an example of of this would be domain names. Since the data is sorted in a lexicographic manner, if you do no reserve the domain name , all the subdomins of a single domain will be on different servers. 

Compared to this if you have “com.cnn” and “.com.cnn.us”, they will be stored together.

Column Families

In HBase you can group together multiple columns into a single column family and you can have one or more column families for each row. However, is recommended to not have more than 10 column families for performance purposes


Under a column family you can have as row qualifiers as you want. A Column is identified by a Column Qualifier that is a combination of Column Family name concatenated with the Column name using a colon – example: columnfamily:columnname.


HBase stores the data in a cell as a unique combination of row key, column family, and column qualifier, and contains a value and a timestamp.


You can have multiple versions of the same data in HBase, so you find versions according to timestamp.

How to setup HBase in a Google Dataproc Hadoop cluster.

  1. Download HBase , unzip it and then move it to usr/local/HBase folder
<code>wget <a href="https://www-us.apache.org/dist/hbase/1.4.9/hbase-1.4.9-bin.tar.gz">https://www-us.apache.org/dist/hbase/1.4.9/hbase-1.4.9-bin.tar.gz</a></code>
<code>tar xzvf hbase-1.4.9-bin.tar.gz</code>
<code>sudo mv hbase-1.4.9</code> <code>/usr/local/HBase/</code> 

2) Check for Java and change the hbase-env.sh under folder /usr/local/HBase/conf/

Hbase needs java to run , so we have to provide the path for the java version in your local server in the hbase-env.sh file.

If you don’t have java installed in your server, use the following commands to get your self a copy.

sudo add-apt-repository ppa:webupd8team/java 
sudo apt-get update 
sudo apt-get install oracle-java8-installer

You can also check if Java is available in your server, usually, in google data proc clusters it is already available.

java -version

After this, you can go to /usr/local/HBase/conf/ folder and change the hbase-env.sh file

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

3)  Change the hbase-site.xml file in /usr/local/HBase/conf/


4) Startup HBase For starting up HBase use the following command.

sudo /usr/local/HBase/bin/start-hbase.sh

5) Run Hbase Shell For running Hbase shell either you can first make change to the bash_profile file or directly start the shell from the bin folder. For the first method

Add these to .profile:

export HBASE_HOME=/usr/local/Hbase

Followed by:

source ~/.profile

Or you can go to the cd /usr/local/HBase/bin and use the shell command directly

./hbase shell

Importing Shell Commands in HBase

Now let us try to get some see some important command in HBase from the shell command line

Create a table 

This is the syntax to create a table in HBase

create '<table_name>', '<column_family_name>'

Let’s first create a userrating table

create 'userratings','ratings'

Check if it created

list 'userrating'

Now let us add some data into the table for user_id 1 rating a movie with id 1 a rating of 4

put 'userrating',1,'rating:1','4';

About the author


Mastering Data Engineering/ Data science one project at a time. I have worked and developed multiple startups before, and this blog is for my journey as a startup in itself where I iterate and learn.

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Copyright © 2023. Created by Meks. Powered by WordPress.