BIG DATA

BIG DATA

  • What is actually the BIG DATA is .....??

Big data refers to the large, diverse sets of information that grow at ever-increasing rates. It encompasses the volume of information, the velocity or speed at which it is created and collected, and the variety or scope of the data points being covered.

  • Now whats the problem related to BIG DATA...??

problem with big data is that it grows constantly and organizations often fail to capture the opportunities and extract actionable data. Companies often fail to recognize on where they need to allocate their resources. This failure in allocating the resources results in not making the most of the information.

  • Four V’s of Big Data are :

Velocity - Velocity refers to the speed with which data is generated. High velocity data is generated with such a pace that it requires distinct (distributed) processing techniques. An example of a data that is generated with high velocity would be Twitter messages or Facebook posts.

Variety - BIG DATA is available in various form i.e Structured(Data in CSV format),Semi-structured(Data in doc,XML),Unstructured(Audio,Images,Video).

Volume -The main characteristic that makes data “big” is the sheer volume. It makes no sense to focus on minimum storage units because the total amount of information is growing exponentially every year.Companies like Facebook generates 4 petabyte of data per day.

Veracity -Veracity refers to the trustworthiness of the data. Can the manager rely on the fact that the data is representative? Every good manager knows that there are inherent discrepancies in all the data collected.

  • What is the solution for it....??

Distributed Storage:

Distributed Computing together with management and parallel processing principle allow to acquire and analyze intelligence from Big Data making Big Data Analytics a reality. Different aspects of the distributed computing paradigm resolve different types of challenges involved in Analytics of Big Data.

For implementing this Concept we require one product and that product is known as Hadoop and In Hadoop we are going to create Master and Slave Relation i.e Cluster and the whole process is known as Hadoop Cluster.

  • What is Hadoop.....?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware(This means the system is capable of running different operating systems (OSes) such as Windows or Linux without requiring special drivers.). It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

  •  What is Hadoop Cluster.... ?

A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. Such clusters run Hadoop's open source distributed processing software on low-cost commodity computers.

*NETFLIX CASE STUDY...!!

Data can be a powerful resource if used properly, but can also be a swamp of jumbled, unintelligible information.From relatively humble beginnings as a DVD-by-mail service, Netflix has grown into one of the most influential media streaming service in the world. The company was one of the first to see the potential of streaming technology and began to transition to a subscription video-on-demand model in 2007. Since this transition,annual revenue has grown from 1.36 billion to around 15.8 billion in just ten years. The number of Netflix subscribers has followed a similar trend, growing from less than 22 million in 2011 to nearly 150 million in 2019. The service is becoming so popular that an estimated 37 percent of the world’s internet users use Netflix.

So Data is the main form of business in today’s world,the more data you have the bigger your business is......

With a sea of users, each user generates hundreds of ratings per day based on what they watch, search and add to their watch-list, this data ultimately becomes a part of Big Data. Netflix stores all of this information and using key machine learning algorithms, it builds a pattern indicating the viewer’s taste. This pattern may never match with another viewer because of how everyone’s taste is unique.

Based on the ratings, Netflix categorizes its media and suggests the viewer what the recommendation system thinks they might like to watch next.Netflix will know everything.

Netflix will know when a person stops watching it. They have all of their algorithms and will know that this person watched five minutes of a show and then stopped. They can tell by the behavior and the time of day that they are going to come back to it, based on their history.

BIG DATA plays a critical role in not just deciding the functioning of Netflix but also presents them with newer opportunities to grow. New technologies often bring their fair share of issues with them, but at Netflix, they have been tackling those issues head-on, consistently by taking community inputs. By open-sourcing several of the libraries and frameworks to the community, Netflix aims to improve not just itself, but other companies as well. In the end, it would be incorrect to say that Netflix takes all its decisions based on Big Data insights as they still rely on human inputs from a lot of people.

Using big data, Netflix saves $1 billion per year on customer retention.

!!!THAT'S ALL!!!

THANKS FOR READING......!!






要查看或添加评论,请登录

Priyanka Gavali的更多文章

  • ??Ansible Automation Community??

    ??Ansible Automation Community??

    Hello Connections!! ??This Article is based on brief about Automation powered by ANSIBLE,which is the open source…

    8 条评论

社区洞察

其他会员也浏览了