When Big Data Breakthroughs Occur
"We’ve retired the big data hype cycle" - Betsy Burton, Gartner
This year is the first where Gartner has not included big data in any of their hype cycles. "I would not consider big data to be an emerging technology," says Burton. While this news will not affect the NASDAQ or how many artisan bagel shops there are in the SF Bay Area, it is an interesting indicator. Such a statement acknowledges the permanent nature of the big data challenge (volume, variety, velocity) and that there is an adequate number of tools and technologies with which to do real work. Yay.
At the same time, Gartner analyst Merv Adrian argues that only 15% of big data projects are in production. Clearly, there is still a long way to go before we reach the plateau of productivity, to use the Gartner parlance. So, if we use the metric of # of use cases in production, we have a target for what it means to have a breakthrough in big data.
Of course, "in production" is not the most precise measure, but it is one of those I-know-it-when-I-see-it questions that can usually be answered with a yes, a no, or a kinda. This is a discussion that we have all the time with MapR customers, all of which become customers because they have the goal of creating production use cases. But what are the criteria for being "production-ready"? What are the milestones that must be passed to make those breakthroughs?
Assessing Big Data Maturity
These are some of the questions I answer in my new book: Architect's Guide to Implementing a Digital Transformation. I wrote it based on extensive research I have done on hundreds of MapR customers, a majority of which are running big data use cases in production. I created a useful table at the end called the Big Data Maturity Model for IT. It tracks a number of metrics as organizations progress through four phases of their digital transformations.
This is a preview of the maturity model. You can view the entire table here.
While the book doesn't specifically focus on breakthroughs, there is a clearly identifiable pattern or progression from what happens between early experimentation and a complete digital transformation. Roughly speaking, the breakthrough usually occurs after the first full production use cases are running. This may seem obvious, but the qualification of being "in production" is not a single big bang, but more an indication that all key planets have aligned:
Dedicated Horsepower: Hardware is correctly sized and processes have been created to efficiently provision new nodes, or new clusters
Embrace Open Source: Architects, dev ops, and IT ops have a firm handle on the practicalities, opportunities, and potential pitfalls of open source powered application and deployment architectures
C-Suite Buy-in: Executive sponsorship and all key technical and business stakeholders are committed
Identify Data Assets: An inventory of all relevant data sources is well under way. Extra credit if you have begun to monetize your data.
Advanced Skill Sets: Some level of data engineering, data curation, statistics, and data science disciplines have been introduced (identify "known unknowns")
and, most importantly
An Energized Team: When key leadership roles in development, IT ops, and relevant lines of business leads have recognized that the business can not succeed, progress or even continue to function without data-driven applications, analytics, and business operations.
Wherefore Production?
Ok, so I started off using “production-ready” as a criteria for a breakthrough in big data. I know that some of you are thinking of that 15%-in-production number. But why does that figure look so low?
Gartner analyst Nick Heudecker provided some explanation. He postulated that big data projects may be given a lower priority, have a more uncertain tangible ROI, and that big data projects may be folded into larger initiatives and are therefore not called out separately.
This is a reasonable explanation for the survey numbers, but I think there is more to tell. While I can’t provide scientific numbers about MapR customers running production use cases, it is easily well above half. Add to that the fact that the MapR customer retention rate is over 99% and it paints a more positive picture, since a majority of our R&D investment has been spent creating utility-grade reliability for the MapR Converged Data Platform.
The MapR Platform provides myriad high-availability features like automatic failover of cluster nodes, point in time recovery using snapshots, rolling upgrades, disaster recovery through replication or mirroring, and HA protections for NFS, metadata, MapReduce jobs, and much else.
The MapR Converged Data Platform was designed from the outset as a zero-data-loss production runtime that can support massive scale of data and compute.
One of the most important lessons that I have learned in writing this book is that breakthroughs occur once there is a collective understanding and acceptance of big data as the new normal, and that understanding pervades software development, IT operations, and business operations teams. We have known dozens of companies who are self-actuated and have made amazing progress by evolving their dev and ops teams into data savvy organizations. But, these are exceptional cases.
Consider the message behind the Gartner numbers: for many customers, things just aren’t moving fast enough. Which leads me to one small piece of advice:
Pro Tip: Augment your big data team early.
So, there is a meme circulating over the past few weeks about the value of “learning from others’ mistakes.” For your own digital transformation, I suggest that you focus on learning from others’ successes. That means either hiring or renting talent, be it data engineers and data scientists from MapR Professional Services, MapR partners, or other consultancies that specialize in big data and the new analytics. The sooner your teams can flick on the data-driven light switch, the sooner you can achieve the big data breakthroughs that you want and need.
This post originally appears on MapR.com