New Technologies (2 of 6): Big Data Analytics
Manoj Barve
India Head - BVMW (German Federal Association of SMEs) at BVMW - Bundesverband mittelst?ndische Wirtschaft e.V.
Definitions:
Extremely large data sets that may be analyzed computationally to reveal patterns, trends and associations especially relating to human behaviour and interactions. It refers to structured as well as unstructured data sets which traditional database or software find difficult to process.
Big Data refers to technologies and initiatives that involve data that is too diverse, extremely fast-changing or massive for traditional technologies, skills and infrastructure to manage efficiently.
First documented use of the term Big Data appeared in 1997 paper by NASA scientists.
In a simplistic manner, we can summarize as follows:
Traditional data is Structured data – managed by Data Warehouse – value is created by standard Business Analytics tools and Reports.
Big Data is Structured + Unstructured Data – managed by Distributed File System like Hadoop/MapReduce or MongoDB – value is created by usage of newer Big Data Applications.
What is new about Big Data?
Big Data analytics is providing us with enormous insights that were not available before. Big data is nothing fundamentally new. Information Technology (IT) has always dealt with data creation ==> data storage ==> data retrieval ==> data analysis. What’s new are the famous three V’s of Big Data – volume, velocity and variety. It has created a new branch in IT.
- Volume: Associated with the size of data.
- Velocity: Speed at which data is generated, captured and shared. The throughput of data.
- Variety: Big Data is more that rows and columns, and includes structured (eg. Data from Excel tables, ERP, CRM, SCM, Point-of-Sale data) as well as unstructured data (eg. text, hashtags, emails, photographs, Googlemap images, RFID-captured data, audio, video, mobile, social chatter, biometric data, machine-to-machine data and so on)
New Vs’ of Big Data:
- Variability: Inconsistency of the data. (Challenging to manage)
- Veracity: Unlike internally-generated data, Big Data comes from sources outside our control. Reliability or accuracy of the data, utility of the data depends upon veracity.
- Value
How big is Big Data?
We have come a long way since the days of floppy disks when, arguably, Bill Gates said: “640 Kb of RAM ought to be enough for everybody”. The initial floppy disks used to be of 640Kb. They were followed by 1.44 Mb diskettes. They may not be able to hold a single high quality picture today.
One Exabyte is 10/\18 bytes of data (one followed by 18 zeroes).
If standard laptop hard disk (or a Seagate external hard disk) is of one Terrabyte today, one Exabyte is equivalent to one million such laptops or external hard disks.
If a standard three hour Bollywood movie is of the size of 1.5 Gigabyte, over 666 million movies need to be produced to make up one Exabyte. Assuming we make about 150 Hindi and 750 regional language movies a year, One Exabyte will be equivalent to 741 000 years of film production!!!
In the recorded time until 2003 we generated 5 Exabyte of digital data. In 2011, we created equivalent data in two days; in 2014 – in just 10 minutes!
What’s so special about Big Data?
Interpretation of Big Data can bring about insights which might not be immediately visible or which would be impossible to find using traditional methods. This process focuses on finding hidden threads, trends, or patterns which may be invisible to the naked eye. Sounds easy, right? Well, it requires new technologies and skills to analyze the flow of material and draw conclusions.
In the past decade companies like Google, Amazon, eBay, Yahoo and Facebook pioneered businesses built on monetizing massive data volumes they gathered. Big Data is making a quantum leap possible, it is also forcing this leap upon us by challenging us make use of the data currently available, make new technologies available for handling of the data, create new hardware in order to improve efficiency and be able to serve customers better than our competition does.
Some Big Data applications:
Consumer product companies / Retailers: Pricing, stockage, distribution centres
Retailers can track user web clicks to identify behavioural trends that improve campaigns, pricing and stockage
Equipment manufacturers: Preventive/predictive maintenance, Offering AMCs (Annual Maintenance Contracts), Deciding on number and location of Service Centres
Utility companies: Patterns of electricity consumption, surges, peak demand
Utilities can capture household energy usage levels to predict outages and to incentivize more efficient energy consumption.
Customer satisfaction: 360 degree view of millions of customers (which also becomes a concern!)
Governments can detect and track the emergence of an epidemic
Swedish municipalities are identifying the areas vandalism of public property takes place frequently, or the US Police are identifying the sensitive areas with frequent crime - in order to concentrate resources in those areas.
BJP used Big Data Analytics successfully in 2014 elections which, hopefully, is changing the destiny of a nation.
Is there hype about Big Data?
To some extent – yes!
We often hear about insights from Big Data. Value of Big Data is in those insights, and not in the data itself. Insight is defined as “the capacity to gain an accurate and deep understanding of someone or something.” Big Data helps, but not necessarily, and always, lead to accurate and deep understanding. Judgment and intuition continues to play a big role.
Here are some limitations of Big Data:
Using the Big Data
All these numbers of petabyte, exabytes, zettabytes with a lot of zeroes in them, are not much relevant for us as an organisation. None of us is interested in ALL the data that gets generated in internet. What is important from our point of view is – what information is relevant to us, and if we are able to capture it. Even more important is - what are we doing with the information that we already have? Are we already using the insights that data provides? Are those insights leading to additional business, or better servicing of customers? Thus from business perspective, the focus should be on Analytics, than on Big Data. I would trust IT professionals with the handling of technological challenges of Big Data. Let us not get lured by the data-size issues and the tools. Finally, howsoever comprehensive the Big Data is – it has to be matched with a proper understanding and sound judgement. We saw the debacle of Bihar elections where all expert pollsters erred – even beyond the statistical margin of error. NDTV claimed to have done the largest exit poll of 76000 people from every constituency and every strata of the society. It claimed that BJP will win precisely 125 seats. Armed with Big Data analysis possessing the aura of truth, it was overly confident that it’s polls were as good as the reality. The rest is history!
Human bias
Big Data Analytics works on algorithms and logic set by human beings. It is based on the assumption that “history repeats itself”, only we can analyse it more in detail today than in the past. This is a natural limitation and not akin to Big Data Analytics alone. Having said that, Big Data does provide pointers – which – if analysed with an open mind and a natural curiosity - can lead to path-breaking insights.
The enormous challenge of variety
We are yet far away from a seamless integration of heterogeneous data and make sense out of it. The emails, chats, pictures, tweets, audio-bites, videos, machine-to-machine data and what not. It is going to take a while until we have an underlying “language” in which all this assorted data gets translated.
Collecting data indiscriminately
Intuition and judgement is equally important in collating the data. Interpretation of data is often an art. Just having data is one thing and being able to make sense of it and propose actionable intelligence is another. We are still catching up with those skills. Seeing the context, understanding the business environment, taking cognizance of the linkages, the dependencies between different factors involved is of great importance.
What technologies are used to handle Big Data?
So far as hardware is concerned hundreds or thousands of servers may be used to run massively parallel software on distributed data management platform.
Software License Modules:
- Proprietary : Oracle, IBM, Terradata
- Open-source: MongDB, Apache Hadoop
- Cloud Service: Google App Engine, Amazon Elastic MapReduce
We will hear these terms often, but it is about the handling of Big Data, and not about the value to be derived from it.
Concerns:
Ever wondered, search engines or free E-mail programmes offering you holidays where you really thought of going, or books you really wish to buy, or types of movies you do like? Do you get almost as many number of “happy birthday” messages from commercial organisations as from real persons? Each of our digital interaction leaves the footprint – be it a credit card transaction, cash withdrawal at an ATM, checking in at the airport or in a hotel, a harmless communication on Facebook, Twitter, Skype or WhatsApp, or simply ordering a pizza at Pizza Hut. New technologies allow capture of such data, store it, link it and use it in some manner – now or anytime in the future.
Thus, the biggest concern is about data confidentiality, data security and misuse of data.
Current Trends:
- So far as industrial companies are concerned, they have moved from general interest to exploring of theoretical possibilities to focusing on solving of real business problems
- More often than not, in Multinationals, the applications are developed centrally and then rolled out in different locations. It helps standardising the tools and processes, but may end-up ignoring the local trends – especially in consumer products
- Challenges with quality, quantity, security and confidentiality still persist
- The focus is- moving from volume, variety, and velocity towards value the business can extract from the Big Data
Is Big Data going to replace everything?
Death of spreadsheets is exaggerated. We will still be using our Excels and Powerpoints. But the usage will reduce as the data analytics software become easier to handle, intelligent, intuitive, support data quality and data integrity, reduce our dependence on in-house experts and agile enough to support changing business needs.
As the famous quote assigned to Albert Einstein says - "Not everything that can be counted counts, and not everything that counts can be counted."
We must use the analytics to improve our processes, results, and be able to serve customer better, but it should not become an obsession within the organisation. Big Data is not going to replace the judgement, intuition, courage, human intelligence any soon. If at all, Big Data will support these very human traits.
(Source: Wikipedia, SAS, Googlesearch, Concerto, McKinsey and others)
Global Client Director | Sales Leader | Advisory | Information Services | IT Consulting
8 年It's a very good read
India Head - BVMW (German Federal Association of SMEs) at BVMW - Bundesverband mittelst?ndische Wirtschaft e.V.
8 年Thanks Buck! You are always inspiring.
Founder and Principal Consultant at ACE SCM Solutions
8 年Excellent. One often wonders today who can and will track all that we do every day...
Delivering Governance & Risk Management in complex technology environments
8 年Manoj, you are on a roll! Keep it coming.