Are you planning for the Production Deployment of your Hadoop System?

Are you planning for the Production Deployment of your Hadoop System?

Are you planning for the Production Deployment of your Hadoop System?

So, you are a part of the Hadoop Bandwagon now! Your team has spent long hours learning Hadoop and also identified appropriate use-cases where Hadoop can be used. You have also embarked upon a Proof-of-Concept (POC) and successfully completed it. You have showcased it to your Senior Management team and impressed them with how your organisation can derive value from Hadoop. Funding for the next steps is also on the way. Wonderful! So what next?

Getting the Hadoop system deployed on Production! However wait…. are you suffering from the anxiety pangs of deploying the POC into Production? If yes, do not worry. You are a part of the majority who has several questions about deploying a Hadoop system into Production.

Having undergone the journey from POC to Production deployment myself, I would like to share a few points which need to be considered before deploying Hadoop into production. They will be presented in a series of blogs, each covering a few points. I’m sure they will align your thought process in the right direction and enable you to successfully complete the journey of Production Deployment.

a) Is your POC cluster a suitable instance to simulate Production data?

Why is it important – POCs are normally done on a very small cluster with possibly, desktop machines. Though they are suitable for a limited amount of data, it is not possible to simulate Production like scenarios on them.

Dos – Prior to Production deployment, it is essential to have a cluster which is a good representation of the Production instance. This cluster can act as a Pre-Production or Testing instance and allow real-life scenarios to be simulated.

Why do I say it - One of our clients did not want to invest in a Pre-Production instance and used a small 3-node cluster for development and testing. When they ran into performance issues in Production, they did not have an instance to replicate the issues. It was then that they realised the importance of what we had advised them earlier on.

b) Do you have a good idea of the 3 Vs (volume, velocity and variety) of your Production data?

Why is it important - In a POC, the focus is mainly to experiment with the Hadoop ecosystem components which are very new to the team. The emphasis tends to be on doing a lot of trial-and-error, getting a limited amount of data ingested into the Hadoop cluster and presenting it to end-users by means of some grand visualization. Focus on the data itself tends to be limited.

Dos – It is essential that the 3 Vs of Big Data (as applicable to your Production instance), are carefully evaluated. A good amount of testing is required to make sure that your code base is ready to handle it well. The data used for testing needs to be carefully chosen and should be a good representative of the data expected in Production. This will avoid a lot of surprises later.

Why do I say it - One of the clients whom we worked with, encountered some issues with Flume in Production. We recreated the issue on the Testing instance and included some pre-processing to speed up Flume. A root cause analysis indicated that the velocity of data used in the Testing instance (prior to Production deployment), wasn’t close to live data by any scale. A careful evaluation of this factor prior to Production deployment could have avoided this issue.

 c) Have you performed adequate testing on the Pre-Production/Testing instance?

Why is it important – POC data tends to be woefully inadequate. Estimating Production behaviour on POC data is a big risk. POC testing (if done at all!) tends to focus on functionality rather than performance.

Dos – Deploy the POC code on the Pre-Production/Testing instance and test it thoroughly with an adequate sample of the Production data. This will uncover a lot of issues related to Performance and you’ll have a good feel of the potential issues which could happen on Production.

Hope your find these tips useful. In my next blog I will deal with a few other points related to Production deployment. Keep watching this space for more….

要查看或添加评论,请登录

Yogesh Kulkarni的更多文章

社区洞察

其他会员也浏览了