BIG DATA

BIG DATA

What is BIG DATA? Introduction, Types, Characteristics & Example


In order to understand 'Big Data', you first need to know

What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.


Examples Of Big Data

Following are some the examples of Big Data


The New York Stock Exchange generates about one terabyte of new trade data per day.

No alt text provided for this image


Social Media

The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

No alt text provided for this image



A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

No alt text provided for this image



Types Of Big Data

BigData' could be found in three forms:

  1. Structured
  2. Unstructured
  3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.

Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.

Looking at these figures one can easily understand why the name Big Data is given and imagine the challenges involved in its storage and processing.

Do you know? Data stored in a relational database management system is one example of a 'structured' data.

Examples Of Structured Data

An 'Employee' table in a database is an example of Structured Data

Employee_ID Employee_Name Gender Department Salary_In_lacs2365 Rajesh Kulkarni Male Finance6500003398 Pratibha Joshi Female Admin 6500007465 Shushil Roy Male Admin 5000007500 Shubhojit Das Male Finance 5000007699 Priya Sane Female Finance 550000

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format.

Examples Of Un-structured Data

The output returned by 'Google Search'

No alt text provided for this image

Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.

Examples Of Semi-structured Data

Personal data stored in an XML file-

<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Data Growth over the years

No alt text provided for this image



 Please note that web application data, which is unstructured, consists of log files, transaction history files etc. OLTP systems are built to work with structured data wherein data is stored in relations (tables).

Characteristics Of Big Data

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big Data.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.

(iii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

Benefits of Big Data Processing

Ability to process Big Data brings in multiple benefits, such as-

  • Businesses can utilize outside intelligence while taking decisions

Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.

  • Improved customer service

Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.

  • Early identification of risk to the product/services, if any
  • Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data.

Q)How much data is stored by various big Tech companies in a day?

  1. GOOGLE:-
No alt text provided for this image


A data center normally holds petabytes to exabytes of data. Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters.

How much data does google handle??

This is one of those kind of questions whose answer can never be accurate. On a funnier note, it is like a child asking who come first hen or egg?? which is somewhat similar to asking “how much data does google handle??”

Commonly a PC holds 1TB of storage data and a smartphone holds about 64GB, but as days pass there are newer PCs and smartphones with bigger storage than this. We all know Google is the only one who can answer any kind of question!! We simply conclude that Google knows everything!! And Everything means Everything! Now you must be wondering how much data does google handle to answer all these questions!!??

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.

2. YOUTUBE:-

No alt text provided for this image

We exist in a content hungry society. Every second person is creating and posting videos online, whether it’s for fun or for profit, and we are all devouring that content. Over 1 billion hours of YouTube is watched globally per day.

Whether you’re watching because your favourite vlogger dropped a new video, or you’re stuck on the side of the road learning how to change a tire — those videos are costing you data. Read on to find out just how much data YouTube uses.

How much data YouTube will use depends on the quality of your video playback. Watching a YouTube video at the standard 480p uses around 260MB per hour, while Full HD viewing can chew through 1.65GB. 4K video playback on YouTube will use as much as 2.7GB of data every hour.

That means that we as a global community use around 440,000 Terabytes of data on YouTube every day. And that doesn’t even include uploading videos.

3.FACEBOOK:-

No alt text provided for this image

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism”.

VP of Engineering Jay Parikh explained why this is so important to Facebook: “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.” By processing data within minutes, Facebook can rollout out new products, understand user reactions, and modify designs in near real-time.

FLIPKART:-

No alt text provided for this image

Flipkart gets 10 terabytes of user data each day from browsing, searching, buying or not buying, as well as behavior and location. This jumps to 50 terabytes on Big Billion Day sales days. There’s also order data, shipping data, and other forms of data captured by different systems.

Sub Problems under the Big Data:-

  1. VOLUME:
  2. VELOCITY
  3. Etc.

Now after reading above data, how they are manipulating this big data with high velocity and high efficiency.

They are using the concept of distributed storage cluster which they can implement by Hadoop technology .By this technology they are making big data as an advantage for them.

The ubiquity of data

While some people have thrown in the towel early, deciding that big data’s potential can only really be exploited by massive corporations who have access to billions in funding, the greatest aspect of big data is perhaps it’s ubiquity throughout the market and availability to everyone, from Walmart to the local mom and pop store.

Big data’s massive impact on the economy, so big that some experts predict it will have a $15 trillion dollar economic impact in just 15 years, is largely driven by the fact that it’s universally available to large corporations and consumers alike. Nonetheless, tech giants like Google and Amazon are often the innovative birthplaces of the latest big data innovations, so how exactly are these companies taking the numbers and transforming them into usable data?

Companies like Google, which catalog data for literally millions of searches each day, can analyze the information over the long term to detect useful trends and learn about their users. Google’s algorithms make great use of big data, for instance, when trying to determine what you’re searching for after you’ve only inputted a few characters into your search bar.

Other companies, like Amazon, are more ambitious with how they use big data to get to know their customers. Amazon’s marketplace is teeming with suggested products for their consumers, largely because the firm has harnessed big data to determine which products people in a certain demographic are likely to purchase, and markets those products specifically to them.

Amazon isn’t the only company getting to know its users, however. Netflix relies on the data it collects from its customers to determine which genre of programs are likely to be viewed more than others, and uses that information when deciding which pilots to fund and which to pull. The company’s masterful exploitation of data also allows them to determine which shows a user may like ahead of time, so that, like Amazon, they can recommend similar options or programs your friends have recently viewed.

As these tech giants have come to realize the goldmine they possess in their customer’s data, they’ve wised up and made ample investments so that they can make use of it. In one of the largest acquisitions in the history of the tech industry, Google purchased DeepMind, an intelligence startup focused on producing AI which can sort through massive amounts of information effortlessly to find the valuable tidbits.

Big data’s potential doesn’t only belong to the tech giants, however.

Top Big Data Companies To Watch Out

1. Amazon

The online retail giant has access to a massive amount of data on its customers; names, addresses, payments and search histories are all filed away in its data bank.

While this information is obviously put to use in advertising algorithms, Amazon also uses the information to improve customer relations, an area that many big data users overlook.

The next time you contact the Amazon help desk with a query, don’t be surprised when the employee on the other end already has most of the pertinent information about you on hand. This allows for a faster, more efficient customer service experience that doesn’t include having to spell out your name three times.

2. American Express

The American Express Company is using big data to analyse and predict consumer behaviour.

By looking at historical transactions and incorporating more than 100 variables, the company employs sophisticated predictive models in place of traditional business intelligence-based hindsight reporting.

This allows a more accurate forecast of potential churn and customer loyalty. In fact, American Express has claimed that, in their Australian market, they are able to predict 24% of accounts that will close within four months.

3. BDO

National accounting and audit firm BDO puts big data analytics to use in identifying risk and fraud during audits.

Where, in the past, finding the source of a discrepancy would involve numerous interviews and hours of manpower, consulting internal data first allows for a significantly narrowed field and streamlined process.

In one case, BDO Consulting Director Kirstie Tiernan noted, they were able to cut a list of thousands of vendors down to a dozen and, from there, review data individually for inconsistencies. A specific source was identified relatively quickly.

4. Capital One

Marketing is one of the most common uses for big data and Capital One are at the top of the game, utilising big data management to help them ensure the success of all customer offerings.

Through analysis of the demographics and spending habits of customers, Capital One determines the optimal times to present various offers to clients, thus increasing the conversion rates from their communications.

Not only does this result in better uptake but marketing strategies become far more targeted and relevant, therefore improving budget allocation.

Summary

  • Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.
  • Examples of Big Data generation includes stock exchanges, social media sites, jet engines, etc.
  • Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
  • Volume, Variety, Velocity, and Variability are few Characteristics of Bigdata
  • Improved customer service, better operational efficiency, Better Decision Making are few advantages of Bigdat
  • How much data is stored by various big Tech companies in a day?

Thank you


要查看或添加评论,请登录

Robin Kumar的更多文章

社区洞察

其他会员也浏览了