What’s the Big Deal about BIG DATA? (Part 2 of 2)

What’s the Big Deal about BIG DATA? (Part 2 of 2)

Read part 1 of 2 here.

Chapter 4.     How Does That Change Things?

Section 1.       Hypothesis-driven versus discovery-driven method of scientific inquiry

As we have seen from the used car example above, in the olden times—until just a few years back, that is—our typical approach to solving a problem was, given a problem statement, start with an initial hypothesis based upon some idea of what the solution might actually look like, decide what inputs we need to solve the problem, and then find ways to validate (or invalidate) the hypothesis using the initial (limited) inputs we had decided to start with.

And while doing so, knowing fully-well our capabilities to handle only limited data quantities, we essentially keep narrowing down the scope of the problem until it was “manageable”. Also, we take only samples of data to again keep the situation from getting out of our hand.

By the time we are done with this exercise we have, in many cases, turned the original problem from something we needed to solve to merely a reflection of what we thought the answer was supposed to look like discarding what we decided was unimportant. Finally, we would bring to bear, sometimes consciously but more often unconsciously, our own biases and prejudices, using this whole charade of an objective scientific approach to merely give our deeply held preconceived notions a modicum of authenticity and credibility.

In the world of BIG DATA, we are learning that we need to think differently.

Now while we still start with an initial hypothesis deciding what parameters or aspects of a problem we need to consider, as we move ahead, we depart from the old approach. We do not narrow down the scope of the problem, nor do we sample data. Instead, we put everything that we feel is relevant to the problem at hand and populate it with as much data as we can find—the more the merrier. And ask of our algorithm to look for answers devouring every bit of data that is in the pile looking for patterns and unique signatures associated with the problem we are addressing.

As a result, the solution that emerges may or may not be a reflection of our original preconceived ideas. It might even be something we would have never ever imagined going completely contrary to our intuition.

This shift in approach from a hypothesis-driven to one of discovery is a fundamental shift and it will be some time before it becomes more popular and widely accepted.

Chapter 5.     What Is BIG DATA Analytics At Its Core?

Section 1.    Ability to triangulate disparate sources of data

How does BIG DATA make Analytics any different? Let’s look at the following two examples.

1.    Hurricane Sandy prediction differences between the US and European agencies

The initial forecasts for Hurricane Sandy—a category 3 hurricane in October 2012—predicted? that the system would either fizzle over the Atlantic or move northeast avoiding landfall. However, another forecast—this time from across the ocean—suggested a more ominous possibility. It indicated the hurricane will move westwards and go over New Jersey. As it turned out the Europeans were right.

Why was the European forecast better than the American one?  It turns out it included additional predictors—including a low-pressure system that was thousands of miles away from the system developing in the Atlantic—that were initially ignored by the Americans.

The quality of predictions is dependent upon the factors it takes into consideration while doing its analysis. And the ability to use more than one set of data to look for their intersections helps make BIG DATA Analytics that much more powerful.

1.    NYC Firefighters are safer thanks to BIG DATA Analytics

One of the major challenges firefighters in the New York City face involves dealing with fire hazards? associated with illegal conversions—an apartment that is permitted for six people has 60 staying in there. The city has over 900,000 structures and receives roughly 25,000 complaints per year with only 200 inspectors available to inspect them for violations.

The challenge was to find a better way for the city inspectors to prioritize which apartment they needed to inspect. Using data from the various city agencies including the police, the fire department, the department of buildings, and 16 other city agencies the number-crunching team at NYC came up with an algorithm that would eventually help them identify potential violations.

This is something that could have never been possible in the world of pre-BIG DATA era. By putting all these disparate data sources under one analysis, the city was able to improve the inspection success ratio by five times and reduce firefighter exposure to injuries and fatalities by 15 to 17 times of earlier numbers.

Chapter 6.     What Happens to The “Domain Experts” In This World Of BIG DATA?

What does this mean for the domain experts? Are they no more required? Let’s find out.

The domain expert versus the data scientist debate: In a recent panel discussion? entitled “In data science, domain expertise is more important than machine learning skill” six of the top data scientists in Silicon Valley and beyond debated the issue. While arguing in favor of the motion, one of the panelists mentioned it was easier to learn statistics and machine learning than to acquire a lifetime of expertise and intuition. At that point, a lady in the audience got up and shared how she had won contests in domains—and she was a three-time winner of BIG DATA competitions—as varied as “breast cancer, movie prediction, and sales performance” while knowing next to nothing about those subjects when she started.

Google translation linguist: In the example mentioned earlier, when Google was attempting to translate languages, the joke was their prediction models improved when they “threw the linguists out of the room”.

Amazon book recommendations: Even at Amazon—the company that started the “people who bought this also bought…” revolution, a stage came when they had to let go of their initial team that was originally responsible for generating these recommendations using their knowledge about books when the mathematical models started doing a much better job and outperformed the humans resulting in over a third of their revenue coming from these recommendations. So much for domain expertise.

Is it the end of the domain expert era? So does this mean that it signals the demise of the domain expert?

I believe no. No, the domain experts won’t be going away. But yes, they will have a different role to play. Instead of spending their time on what machines could do and better, their time would be well spent doing things only humans can. In a 1960 seminal paper? by computer pioneer, J.C.R. Licklider wrote about 85% of his “thinking” time was spent getting into a position to think. In pursuing activities that were mostly tedious; his choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.

This will change.

For instance, while a GPS can and will give you exact minute to minute and mile to mile guidance on when to turn and how far to go, a human can look at the address and say ‘Oh I know where this place is, it’s right next to the clock tower that is visible across the city. Just look for it and you will reach your destination.’

And then there are other places where the domain expertise comes in.

Coming back to the NYC Firefighters story, the old inspectors—the domain experts—found that they could build on the knowledge generated by the computers. One look at an apartment and they could say if the mathematical model was going to be right or wrong. The apartments that had fresh exterior remodeling and paint were likely to suggest an owner who took good care of the property and thus would not allow its misuse. Later, since these changes required a city permit, and therefore had a record somewhere, they were also included in the mathematical modeling, thereby, greatly improving the prediction accuracy.

In a medical study?, one researcher recorded hundreds of conversations between physicians and their patients of which about half had never been sued while the other half had been sued at least twice. She was able to predict with striking accuracy which physician was likely to have been sued based on just these conversations. The surgeons who had never been sued spent more than three minutes longer with each patient than those who had been sued did (18.3 minutes vs. 15 minutes). And they were more likely to make orienting comments, engage in active listening, and were far more likely to laugh and be funny. 

However, the interesting part comes now. One psychologist listened to these tapes, for each surgeon, she picked two patient conversations, and then from these, she selected two ten-second clips of the doctor talking, giving her a total slice of 40 seconds for each surgeon. Finally, she “content filtered” the slices, removing the high frequency sounds from the speech that enables us to recognize individual words leaving behind a kind of garble that while preserving the intonation, pitch, and the rhythm erases the content. Using that slice alone this psychologist judged them for qualities such as warmth, hostility, dominance, and anxiousness. Her findings: she could predict with a very high rate of accuracy which surgeons got sued and which ones didn’t.

Now, who would imagine listening to garbled sounds one could predict something that otherwise would have involved a myriad host of data points from surgeon’s qualifications, to the patient’s condition, to countless other medical and non-medical issues?

These are some places where I believe the human mind will still reign supreme.

Chapter 7.     Hmmm…Now What?

So what does this finally boil down to? What’s the moral of the story and what’s the takeaway from all this?

Here’s what it means. You need to:

1.    Open yourself up to possibilities. Don’t let technology or human limitations prevent yourself from asking big questions and exploring what the data may be capable of telling you

2.    Ditch the hypothesis model. Except for the very initial hypothesis be prepared to ditch it and let the data speak for itself.

3.    Use the power of human expertise and intuition to validate the results instead of using it for developing them.

4.    Create a small incubator within your organization, initially with one or two people from the inside and a team of experts from the outside. Use this incubator to explore and discover new approaches and solutions to old problems.

5.    Start with a big business problem that has defied solution and where people have accepted it as “fait accompli”. Break it into small byte sized more manageable problems—remember you eat an elephant one bite at a time—and let your BIG DATA Analytics team get to work. And then be prepared to be surprised.

Chapter 8.     So Does This Mean You Will Live Happily Ever After?

In conclusion, does this mean we all get to live happily ever after? Yes, and no.

Yes, because we now have access to tools that we never had earlier, opening us to possibilities hard to pursue or even fathom earlier. However, there are a few caveats. 

1.    Be prepared to be wrong. Remember this is an exploratory mission and that there are as many likely chances of getting it wrong as there are for getting it right. However, with practice and over time you will get good at it. The journey is worth the destination.

2.    Be passionate about the process and dispassionate about the results. Once you have mastered the process the outcome is guaranteed, and until then learn to be not emotionally invested in the results.

3.    Begin with an end in mind. Don’t get so passionate that you first build a hammer and then start looking for a nail1?.

In the end, you have a choice. BIG DATA Analytics is the future of things to come. The companies that use this approach will be the market leaders11 in their respective industries and the rest will be followers. 

No alt text provided for this image

You can either be the rider that leads the parade or be the one to follow behind and clean up after.

Your call!

No alt text provided for this image

Works Cited:

1.    Dragland, Ase. “Big Data, for Better or Worse: 90% of World’s Data Generated Over Last Two Years," May 22, 2013

2.    Doubling every year. Gantz, John; Reinsel, David. “Extracting Value From ChaosIDC iView Study, June 2011

3.    Sanders, Cathleen V. “A Fable”, The Math Forum

4.    Mayer-Schonberger, Viktor. BIG DATA, A Revolution That Will Transform How We Live, Work, And Think. Boston, New York: Houghton Mifflin Harcourt, 2013

5.    Roulstone, Ian & Norbury, John. “How Math Helped Forecast Hurricane Sandy”. Scientific American, July 25, 2013

6.    NYC firefighters. Howard, Alex. “Predictive Data Analytics is Saving Lives and Taxpayer Dollars in New City” June 26, 2012

7.    Driscoll, Michael. “The Data Science Debate: Domain Expertise or Machine Learning

8.    Finn, Holly.     “New     Gum     Shoes     Go     Deep     With     Data”, October 22, 2013

9.    Gladwell, Malcolm. "Blink: The Power of Thinking Without Thinking. New York": Back Bay Books, Little, Brown, 2005

10. Granville, Vince “12 Predictive Analytics Screw-ups”, July 25, 2013

11. Manyika, James, Chui, Michael et al. “Big Data: The Next Frontier for Innovation, Competition, and ProductivityMcKinsey Global Institute Report, May 2011

要查看或添加评论,请登录

Sonya with a "Y"的更多文章

社区洞察

其他会员也浏览了