Big data - big mess or big money?
Gaurav Garg
COO @ LetsLocalise | Creator of Multi-sided Ed-Tech Digital Platform | Transformational Leader | Programme Director | Ed-Tech Executive | Super Coach | I AM FUTURE READY | 074272 11111
Big data, algorithms, cloud, machine learning, hadoop, NO SQL, data scientists, artificial intelligence, advanced analytics …the list goes on and on and on. Everywhere I go, everyone seems to be talking about these topics but without much in-depth understanding of these. Questions galore but very few answers are available. In this blog I will try to de-mystify some aspects of big data. How do all of the topics fit together? What do the organizations need to do to get on the next wave of big opportunity? What are the challenges and how can organizations overcome these? How must companies react to this deluge of data we all find ourselves in?
In this digital world, everything we do leaves a digital print or creates data. We are creating data as never before - did you know that more than 90% of the world’s existing data was generated in the last 2 years?! With all the data being generated, companies are being pressured to do something with it – big data at the top of the Gartner Hype Cycle – at the Peak of Inflated Expectations.
Figure 1. The Stages of Big Data Adoption, 2013 and 2014
In 2014, n = 302; In 2013, n = 720.
Source: Gartner (September 2014)
Figure 1 above suggests that more than 70% of organizations are dabbling in big data but only about 12% have actually deployed solutions. Why is the rate of success so low? A lot of organizations look at all the data being collected, get excited about the possibilities and try to solve something, anything that may be aspirational and, hence, struggle to achieve desired outcome. In order to improve their success rate, I believe that organizations must change their approach to Big Data. They must start with identifying business outcomes that they are seeking and work backwards to the sources of data, relevant vendors in the space, the infrastructure required and the skills needed to achieve the business outcome.
As we define the business challenge as a starting point, we can then start to lay out the internal and external data sources needed and architect this data; identify the skills required for collecting and analysing the structured and unstructured data; and then agree on the infrastructure needed to support the process in a sustainable and a scalable manner. This approach ensures fast, tangible results as we grow skills, confidence, knowledge and scale on Big Data within the organization. Each of these steps, however, has challenges that need to be addressed adequately and strategically. So, let’s check each of these out individually.
Data is an asset – nothing new in the statement. However, not ALL data is an asset. Overwhelming amount of data that is being collected today can be categorized as noise which must be removed.
As we move from ad-hoc, offline decision making (Descriptive Analytics) to pervasive, real-time decision making (Prescriptive Analytics), I believe that, companies will need to identify, remove noise, prepare and embellish their structured internal data with new data sources. Notwithstanding data quality and completeness issues, the existing internal data only provides skin-deep insights on customers. This must be augmented with the customers’ behavioural interactions to create their personal DNA – segment of one. On top of this foundation, you can add your research data, segmentations, derived attributes, propensity scores, prospect data, contact history, response and so on.
Majority of organizations lack the skills and capabilities needed across the big data value chain. Clearly, duality of the improvement in processing power and rate of data generation has caught us all unawares which has resulted in skills shortage. Some of the skills needed are Data Architects; Programmers for data preparation and to write algorithms; Data Analysts to identify relevant data sources and Data Scientists to draw data insights. For a sustainable innovation where multiple big data solutions come out frequently; organizations must encourage and build internal skills, through focused training programs, along with hiring external big data consultants to fill up the skills gap and hasten the learning process.
Anytime a technology is at the top of the Hype Cycle, it attracts a lot of vendors to cover its value chain. As such, big data space has many, many vendors offering plethora of niche solutions. Naturally, these vendors are chasing organizations for business and in the process confusing the matters. The mismatch in organizations’ need and understand of big data; and vendor niche offering muddles up waters. As stated above, my advice to organizations will be to identify a business outcome they seek and interact with those vendors who can help them achieve it. A short-delivery cycle on business outcome, irrespective of success or failure, will build excitement within the organization.
In summary, big data is a mess at the moment and it will soon slide down from the top of the Hype curve into the trough of disillusionment. This will result in vendor consolidation and establishing of standards and best practices will put it firmly on the plateau of value realization. I bet this to happen in next 2 – 3 years, if not earlier and organizations that are testing big data waters today will benefit most from their experience at that point.
Lastly, in this era of disruptive business models, I see vendors aligning themselves along the business outcomes in the same manner as Experian - a company that collects and provides credit rating on consumers. New data intermediaries will collect all the relevant external data for a specific business outcome. User organizations will link into this external data; combine it with internal data on their customers to develop insights. For example, in insurance sector accessibility to real-time data e.g. telematics, has significant impact on the assessment risk which is the foundation of insurance. I think that in near future we will see businesses emerge that would aggregate all this data and enable insurers to price risk on real-time basis.