HBase MemStore
Malini Shukla
Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist
HBase MemStore
All the updates in memory as sorted KeyValues are stored in the MemStore. Basically, Data which contains sorted key/values is stored in an HFile. Moreover, per column family, there is one MemStore. Also, all the updates are sorted per column family.
What is HBase Memstore
In other words, all the in-memory modifications to the Store generally stores in a Memstore. Here, modifications are KeyValues.
Make sure we should not call the functions of HBase Memstore in parallel.
Let’s revise features of HBase
Some Key points or Memstore in HBase:
- In simple words, before a permanent write, a write buffer where HBase accumulates data in memory is what we call the MemStore.
- While the MemStore fills up, its contents flush to disk to form an HFile.
- It forms a new file on every flush, rather than writing to an existing HFile.
- Basically, for HBase, the HFile is the underlying storage format.
- Per column family, there is one MemStore. It is possible that one column family can have multiple HFiles, but not vice versa.
Following occurs, while the server hosting a MemStore that has not yet been flushed crashes:
- In order to record changes as they happen, every server in HBase cluster keeps a WAL. On defining a WAL, it is a file on the underlying file system. However, until the new WAL entry is successfully written, a write isn’t considered successful, this explains its durability.
- The data which was not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL, if HBase goes down, that is taken care by Hbase framework.
Uses of HBase MemStore
Well, HBase users and/or administrators must know the meaning and uses of HBase MemStore, because:
Learn Hadoop from Industry Experts
- In order to gain better performance as well as to ignore issues, we can use MemStore in HBase. However, it is not possible to adjust settings in HBase on the basis of usage pattern.
- However, make sure constant flushes of HBase MemStore can affect reading performance in MemStore. Also, it can bring an additional load to the system.
- Moreover, the way in which MemStore flushes work it may affect our schema design.
Configuring MemStore Flushes
However, two types of groups are there of configuration properties in HBase MemStore:
- The first one determines at what time flush should trigger
- And, the second one also determines that at what time flush should be triggered but along with the updates which should be blocked while flushing.
Now, let’s learn about these groups in detail:
a. First Group
Basically, the “regular” flushes which happen in parallel with serving write requests, the first group triggers them.
However, for configuring flush thresholds, the properties are:
- hbase.hregion.memstore.flush.size
- <property>
- <name>hbase.hregion.memstore.flush.size</name>
- <value>134217728</value>
- <description>
- Memstore will be flushed to disk if size of the memstore
- exceeds this number of bytes. Value is checked by a thread that runs
- every hbase.server.thread.wakefrequency.
- </description>
- </property>
- base.regionserver.global.memstore.lowerLimit
- <property>
- <name>hbase.regionserver.global.memstore.lowerLimit</name>
- <value>0.35</value>
- <description>Maximum size of all memstores in a region server before
- flushes are forced. Defaults to 35% of heap.
- This value equal to hbase.regionserver.global.memstore.upperLimit causes
- less possible flushing which occurs when due to
- memstore limiting, updates are blocked.
- </description>
- </property>
b. Second Group
Well we can say, mainly for safety reasons, the second group of settings is there, like- there are times when write load is so high, that even flushing cannot keep up with it hence, for that writes are blocked unless MemStore has “manageable” size as we don’t want memStore to grow without a limit.
So, with following, it is possible to configure thresholds:
Explore Apache HBase Career Scope With Salary Trends 2018
- hbase.regionserver.global.memstore.upperLimit
- <property>
- <name>hbase.regionserver.global.memstore.upperLimit</name>
- <value>0.4</value>
- <description>Maximum size of all memstores in a region server before new
- updates are blocked and flushes are forced. Defaults to 40% of heap.
- Updates are blocked and flushes are forced until size of all memstores
- in a region server hits hbase.regionserver.global.memstore.lowerLimit.
- </description>
- </property>
- hbase.hregion.memstore.block.multiplier
- <property>
- <name>hbase.hregion.memstore.block.multiplier</name>
- <value>2</value>
- <description>
- Block updates if memstore has hbase.hregion.block.memstore
- time hbase.hregion.flush.size bytes. Useful preventing
- runaway memstore during spikes in update traffic. Without an
- upper-bound, memstore fills such that when it flushes the
- resultant flush files take a long time to compact or split, or
- worse, we OOME.
- </description>
- </property>