HBase Performance Tuning
HBase Performance Tuning

HBase Performance Tuning

Garbage Collection Tuning

Garbage Collection Parameter is one of the lower-level settings we need to adjust for the region server processes. Although make sure, the master is not a problem here as data does not pass through it and it does not handle any heavy loads either. However, only to the HBase Region Servers, we need to add these Garbage Collection Parameters for HBase Performance Tuning.

Let’s explore HBase pros & cons

Memstore-Local Allocation Buffer

In order to mitigate the issue of heap fragmentation due to too much churn on the memstore instances of an HBase Region Server, version 0.90 of HBase introduced an advanced mechanism, the Memstore-Local Allocation Buffers(MSLAB).

Basically, these MSLABs are buffers of fixed sizes which consist of KeyValue instances of varying sizes. There are times when a buffer cannot completely fit a newly added KeyValue, at that time it is considered full and then once again a new buffer is created for the given fixed size.

HBase Compression 

There is one more feature of HBase, that it support for a number of compression algorithms in HBase. Basically, HBase compression algorithms can be enabled at the column family level.

In addition, compression yields better performance, for every other use case, it is possible because there is CPU which is performing the compression and decompression, its overhead is less than the actual demand to read more data from the disk.

i. Available HBase Codecs

There is a fixed list of supported compression algorithms in HBase, we can select from it. Although, when it comes to compression ratio, as well as CPU and installation requirements, they have different qualities.

ii. Verifying Installation

It is highly recommended that you check if the installation was successful, as soon as we have installed a supported HBase compression algorithm. So, to do that, there are several mechanisms in HBase.

  • HBase Compression test tool

In order to test if compression is set up properly or not, there is a tool available in HBase. Hence, to use it, run following command:

./bin/ hbase org.apache.hadoop.hbase.util.CompressionTest,

Thus, it returns the information on way to run the tool:

$ ./bin/hbase org.apache.hadoop.hbase.util.CompressionTest
Usage: CompressionTest <path> none|gz|lzo|snappy

For example:

hbase class org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile gz

iii. Enabling Compression

The installation of the JNI and native compression libraries is must for Enabling compression.

 Let’s discuss HBase Use Cases and Real-time Applications

hbase(main):001:0> create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }
0 row(s) in 1.1920 seconds
hbase(main):012:0> describe 'testtable'
DESCRIPTION ENABLED
{NAME => 'testtable', FAMILIES => [{NAME => 'colfam1', true
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS
=> '3', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0400 seconds

In order to read back the schema of the newly created table, we use the describe HBase shell command. Here, we can see that the compression is set to GZIP. Moreover, we use the alter command for existing tables to enable—or change or disable—the compression algorithm.

Also, disable the compression for the given column family to change the compression format to NONE.

  • Load Balancing

There is one built-in feature in Master, what we call the balancer. Basically, the balancer runs every five minutes, by default. And, by the hbase.balancer.period property, we configure it.

Its process is like, as soon as it starts, it strives to equal out the number of assigned regions per region server hence they are within one region of the average number per server. Basically, the call first determines a new assignment plan. So, that explains which regions should be moved where. Then by calling the unassign() method of the administrative API iteratively, it starts the process of moving the regions.

Also, there is an upper limit in the balancer, which decides how long it is allowed to run. Basically, by using the hbase.balancer.max.balancing property, it is configured or defaults to half of the balancer period vale, or two and a half minutes.

  • Merging Regions

Sometimes we may need to merge regions since it is much more common for regions to split automatically over time as we are adding data to the corresponding table.Let’s understand with an example, let we want to reduce the number of regions hosted by each server after we have removed a large amount of data, so there is a tool in HBase which permits us to merge two adjacent regions as long as the cluster is not online. Therefore, below is a command-line tool we can use to get the usage details:

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>
  • Client API: Best Practices

There are a handful of optimizations we should consider to gain the best performance while reading or writing data from a client using the API. 

  • Disable auto-flush

By using the setAutoFlush(false) method, set the auto-flush feature ofHTable to false while performing a lot of put operations.

  • Limit scan scope

It says, be aware of which attributes we are selecting when we use scan to process large numbers of rows.

Learn HBase Operations: Read and Write Operations

  • Close ResultScanners

This may not help in improving performance, but definitely helps rather avoiding performance problems.

  • Block cache usage

Furthermore, by the setCacheBlocks() method, we can set Scan instances to use the block cache in the region server.

  1. Optimal loading of row keys
  2. Turn off WAL on Puts

Read Complete Article>>




要查看或添加评论,请登录

Malini Shukla的更多文章

社区洞察

其他会员也浏览了