登录查看更多内容

Specific points on HBase (BigData)

Abhishek Choudhary

Data Infrastructure Engineering in RWE/RWD | Healthtech DhanvantriAI

发布日期: 2015年4月21日

Hbase which is ideally evolved from Google big table and an alternative of RDBMS in BigData stack is actually has a great potential.
Well I can't document it all here but few things which I realised while working on it -

HBase prefers Denormalisation which leads to duplicate data but it keeps the Duplicate Data together and if Space is getting cheaper why not making things simpler and efficient

Hbase database modelling itself is very technical work but if anyone understands the Row Key concept or what should be the Row Key , rest Columns or Column family in HBase will be very simple to make, so make it a point that Identifying attribute is always the ROW KEY

One-to-many parent child relationship can be build by maintaining children in the same row as the Parent, in a column family as nested entities.

Read more on - https://hbase.apache.org/book.html

要查看或添加评论，请登录

Abhishek Choudhary的更多文章

Slack New Architecture

2020年1月1日

Slack New Architecture

This article presented the architecture/engineering decisions and changes brought in Slack to Scale it massively but by…
Unit Testing Apache Spark Applications in Scala or Python

2017年7月12日

Unit Testing Apache Spark Applications in Scala or Python

I saw a trend that developers usually find it very complicated to test spark application, may be no good library…
Spark On YARN cluster, Some Observations

2017年4月24日

Spark On YARN cluster, Some Observations

1. Number of partitions in Spark Basic => n Number of cores = n partitions = Number of executors Good => 2-3 times of…

4 条评论
Apache Spark (Big Data) Cache - Something Nice to Know

2017年1月17日

Apache Spark (Big Data) Cache - Something Nice to Know

Spark Caching is one of the most important aspect of in-memory computing technology. Spark RDD Caching is required when…
Apache Airflow - if you are bored of Oozie & style

2016年12月12日

Apache Airflow - if you are bored of Oozie & style

Apache Airflow is an incubator Apache project for Workflow or Job Scheduler. DAG is the backbone of airflow.

1 条评论
Apache Spark Serialization issue

2016年11月13日

Apache Spark Serialization issue

Its bit common to face Spark Serialization Issue while working with Streaming or basic Spark Job org.apache.

3 条评论
Few points On Apache Spark 2.0 Streaming Over cluster

2016年8月23日

Few points On Apache Spark 2.0 Streaming Over cluster

Experience on Apache Spark 2.0 Streaming Over cluster Apache Spark streaming documentation has enough details about its…
Facebook Architecture (Technical)

2015年11月19日

Facebook Architecture (Technical)

Facebook's current architecture is: Web front-end written in PHP. Facebook's HipHop Compiler [1] then converts it to…
Apache Flink ,From a Developer point of View

2015年10月26日

Apache Flink ,From a Developer point of View

What is Apache Flink ? Apache Flink is an open source platform for distributed stream and batch data processing Flink’s…

2 条评论
Apache Spark (big Data) DataFrame - Things to know

2015年10月12日

Apache Spark (big Data) DataFrame - Things to know

What is the architecture of Apache Spark Now? What is the point of interaction in Spark? Previously it was RDD but…

6 条评论

See all articles

Specific points on HBase (BigData)

Abhishek Choudhary

Data Infrastructure Engineering in RWE/RWD | Healthtech DhanvantriAI

Abhishek Choudhary的更多文章

社区洞察

其他会员也浏览了

Camouflage | New release features added support for big data

Hadoop and the Data Warehouse: When to Use Which

PolyBase in SQL Server 2016

TEAM TASK - INSIDE A HADOOP CLUSTER!

Difference between Block size and InputSplit size in HDFS..!!

Configure core-site.xml for hadoop master slave topology(rest part)

??Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

??Limiting the storage of data node in the hadoop cluster??

Hadoop Summit 2016 - Big Data Ready Enterprise

ALLOCATING DESIRED STORAGE AS SLAVE TO THE CLUSTER IN HADOOP

Abhishek Choudhary的更多文章

Slack New Architecture

Unit Testing Apache Spark Applications in Scala or Python

Spark On YARN cluster, Some Observations

Apache Spark (Big Data) Cache - Something Nice to Know

Apache Airflow - if you are bored of Oozie & style

Apache Spark Serialization issue

Few points On Apache Spark 2.0 Streaming Over cluster

Facebook Architecture (Technical)

Apache Flink ,From a Developer point of View

Apache Spark (big Data) DataFrame - Things to know

社区洞察

其他会员也浏览了

Camouflage | New release features added support for big data

Hadoop and the Data Warehouse: When to Use Which

PolyBase in SQL Server 2016

TEAM TASK - INSIDE A HADOOP CLUSTER!

Difference between Block size and InputSplit size in HDFS..!!

Configure core-site.xml for hadoop master slave topology(rest part)

??Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

??Limiting the storage of data node in the hadoop cluster??

Hadoop Summit 2016 - Big Data Ready Enterprise

ALLOCATING DESIRED STORAGE AS SLAVE TO THE CLUSTER IN HADOOP