登录查看更多内容

9 best practices,I was not aware in Hadoop for Enterprise in late 2015 : Enterprise Productivity

Amal P S

Entrepreneur | Attentive Learner (Former Group CEO Omind, Former Founder & CEO at Keito)

发布日期: 2017年1月13日

Getting started with Hadoop is like buying a car—about 100 years ago. Back then, only hobbyists and status seekers drove their own vehicles, which were temperamental, to put it mildly. To operate a car, you pretty much had to be a mechanic because the only thing you could count on was a breakdown. The dashboard showed little, if any, actionable information. Only after automobiles became user-friendly did they really take off. There are parallels with Apache Hadoop. Enterprise IT teams were once led to believe that if they built on top of Hadoop, they could replicate the business successes of giants like Twitter, LinkedIn and Netflix. However, working on Hadoop often has proved problematic, and enterprises are struggling to deploy Hadoop applications with the standards of quality, reliability and manageability that they expect. Out of the box, Hadoop is not operationally ready. This slide show, developed using eWEEK reporting and insight from Supreet Oberoi, vice president of field engineering at big data specialist Concurrent, discusses best practices for how IT organizations can achieve operational readiness on Hadoop—or how to transform every big data project from a Model T into a Camry.

When there's a performance problem on Hadoop, there can be several culprits: your code, your data, your hardware or the way you're sharing resources (your cluster configuration). At a startup, a single person—data scientist, developer and operator all rolled into one—might be responsible for all that. But at a large enterprise, multiple teams have to cooperate to figure out what went wrong and how to fix it. If you're managing a big data operation at a large, distributed organization, nurture collaboration by giving your team tools that let developers, operators and managers work together to address performance issues.

When execution errors do arise, teams can spend hours tracking down what went wrong if all they've got are Hadoop's log files and Job Tracker. Invest in tools that help your team quickly connect errors to application context—where in your code they're happening—and share that information easily.

To an operator running hundreds or thousands of applications on a Hadoop cluster, all of them look the same—until there's a problem. So you need tools that let you look at performance over groups of applications. Ideally, you should be able to segment performance tracking by application types, departments, teams and data-sensitivity levels.

Monitoring a fleet still means knowing when an individual vehicle performs poorly. Similarly, operators need to set SLA bounds on performance and define alerts and escalation paths when they're violated. SLA bounds should incorporate both raw metadata, such as job status, as well as business-level events, such as sensitive data access. Successful practitioners of operational readiness also set up metrics that help predict future SLA violations, so they can proactively address and avoid them.

Large, traditional enterprises tend to run their Hadoop clusters as a shared service across many lines of business. As a result, each application has at least a few "roommates" in the cluster, some of which can be detrimental to its own performance. To understand the errant behavior of one Hadoop application, operators must understand what others were doing on the cluster when it ran. Therefore, provide your operations team with as much cluster-related context as possible.

To optimize cluster use and ROI, operators must ration resources on the cluster and enforce the limits. An operator can budget mappers for the execution of a particular application, and if the application doesn't perform appropriately, rationing rules should prevent the application from being deployed. Establishing and enforcing the rules for rationing cluster resources is vital for achieving meaningful operational readiness and meeting SLA commitments.

Good Hadoop management isn't only about rationing compute resources; it also means regulating access to sensitive data, especially in industries with heightened privacy concerns like health care, insurance and financial services. Solving for data lineage and governance in an unstructured environment like Hadoop is difficult. Traditional techniques to manually maintain a metadata dictionary quickly lead to stale and old repositories, and they offer no way to prove that a production dataset is dependent on some fields and not on others. As a result, visibility and enforcement on the use of data fields are required at the operational level. If you can reliably track if and when a data field is accessed by an app, your compliance teams will be happy.

Compliance professionals at large enterprises also want proof that a Hadoop application processed every record in a dataset, and they look for documentation when it fails to do so. Failures can result from format changes in upstream data sets or plain old data corruption. Keeping track of all records that the application failed to process is particularly vital in regulated industries.

With new compute fabrics emerging all the time, teams are sometimes too quick to junk their old ones in pursuit of better performance. However, it's often the case that you can achieve equal or greater performance gains just by optimizing code and data flows on your existing fabrics. That way, you can avoid expensive infrastructure upgrades unless they're truly necessary.

Credits : Chris Preimesberger "Thanks for letting us know this"

查看更多评论

要查看或添加评论，请登录

Amal P S的更多文章

Top 3 startup costs that can destroy your business :) #1

2022年7月1日

Top 3 startup costs that can destroy your business :) #1

Here I want to share my experience in building, running, and being part of various businesses over the last 10 years…

2 条评论
How to choose an AI Virtual Assistant for every organization?

2019年2月20日

How to choose an AI Virtual Assistant for every organization?

I thought of writing the same after reading an article written by Paul…

6 条评论
One(1) simple rule to be ultra productive

2018年3月16日

One(1) simple rule to be ultra productive

How to ultra-productive, in one simple step? Building and leading a team of super-productive people @ Keito, we are…
Why I hire for Attitude over Skills ?

2018年2月7日

Why I hire for Attitude over Skills ?

When I hire people, I’m not hiring a job description. Rarely am I checking for particular skills (highly skilled…

1 条评论
7 ways in which, I have increased my productivity by 4x

2017年11月17日

7 ways in which, I have increased my productivity by 4x

Every second of your life is gold. No rewind.

5 条评论
9 Free Business Productivity Tools For Startups

2016年3月28日

9 Free Business Productivity Tools For Startups

Starting a business can be a daunting endeavor, especially if all you have is a cool product and not enough capital. In…

5 条评论
Facebook rejection that turned into a $4 billion cheque

2016年3月16日

Facebook rejection that turned into a $4 billion cheque

Brian was feeling a bit washed up. His 11 years as an early employee at Yahoo! was now two years in the past.
Things to consider before accepting an "InternSHIP"

2016年1月24日

Things to consider before accepting an "InternSHIP"

Internships: the new fad! Internships are part of the undergraduate education of a student. It gives insight into the…

2 条评论
Looking Ahead: Security Predictions for 2016

2015年12月16日

Looking Ahead: Security Predictions for 2016

Today's cybercriminals are skilled enough and sufficiently resourced to have the persistence and patience to carry out…

8 条评论
10 ways to deal with the new JOB

2015年10月13日

10 ways to deal with the new JOB

Communication: Always communicate with your colleagues and new team mates. Be Flexible Friendly : Try to make friends…

See all articles

9 best practices,I was not aware in Hadoop for Enterprise in late 2015 : Enterprise Productivity

Amal P S

Entrepreneur | Attentive Learner (Former Group CEO Omind, Former Founder & CEO at Keito)

Amal P S的更多文章

社区洞察

其他会员也浏览了

Understanding Narrow and Wide Transformations in Apache Hadoop and Apache Spark

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Hadoop File Formats, when and what to use?

Hadoop vs Spark: Which Big Data Framework is the Best Fit for Your Organization?

What is the Future of Hadoop? We asked the experts

Data Hubs: MarkLogic vs. Hadoop

How "HADOOP" revolutionised Data Processing

Hadoop Architecture

Harnessing the Power of Hadoop: Revolutionizing Big Data Processing

3 Solutions for Big Data’s Small Files Problem !

Amal P S的更多文章

Top 3 startup costs that can destroy your business :) #1

How to choose an AI Virtual Assistant for every organization?

One(1) simple rule to be ultra productive

Why I hire for Attitude over Skills ?

7 ways in which, I have increased my productivity by 4x

9 Free Business Productivity Tools For Startups

Facebook rejection that turned into a $4 billion cheque

Things to consider before accepting an "InternSHIP"

Looking Ahead: Security Predictions for 2016

10 ways to deal with the new JOB

社区洞察

其他会员也浏览了

Understanding Narrow and Wide Transformations in Apache Hadoop and Apache Spark

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Hadoop File Formats, when and what to use?

Hadoop vs Spark: Which Big Data Framework is the Best Fit for Your Organization?

What is the Future of Hadoop? We asked the experts

Data Hubs: MarkLogic vs. Hadoop

How "HADOOP" revolutionised Data Processing

Hadoop Architecture

Harnessing the Power of Hadoop: Revolutionizing Big Data Processing

3 Solutions for Big Data’s Small Files Problem !