How MNC's using BigData to manage and manipulate Thousands of Terrabyte of data with High Speed.

How MNC's using BigData to manage and manipulate Thousands of Terrabyte of data with High Speed.

# What is Big Data?

According to research last year 4.13 Billion users are active on internet. Research says 2.5 quintillion bytes of data generated every day in upcoming years it will reach upto 463 exabytes of data generates per day. Data is increasing day by day and it becomes difficult to manage all these large data by traditional methods. Here BigData technology plays major role for managing huge, complex data.

Big Data is a term refers for collection and management of complex, large data sets which is difficult for processing using traditional method.

# Four V's of Big Data are

1) Volume

Volume refers to size of data. Now in every sector storage of data becomes hard it requires a way to store all these huge data and pre-ready for upcoming data storage management techniques. We can now use low cost storage hardware for storage purpose with the help of Hadoop cluster.

2) Velocity

Velocity refers for time taken by I/O to retrieve, store data and processing it. It is normal to perform I/O operation on small data set but if data coming in huge volume in small time interval it becomes hard. For these we use distributed storage management and performs all these task for faster I/O. Hadoop plays major role for distributed systems.

3) Variety

Variety refers for distinct forms of data for ex: data available in organised way or in un-organised manner. Types of variety a) structured, b) semi structured and c) unstructured format.

4) Veracity

In case of Veracity we have lot's of data coming in high speed but we don't predict data is coming from authorised source or it is correct or not.

# What is Hadoop?

No alt text provided for this image

Hadoop is an open source framework developed by Apache. Hadoop works for both structured and un-structured data. It helps to store and process data in distributed storage kind of system. Hadoop allows clustering multiple computer.

Hadoop creates cluster. Basically a cluster contain atleast one master node and atleast one slave node. Master node distributes single data into multiple slave node and then slave node perform data processing. This makes processing speed fast and data management becomes easy. Master node receives data and it break down data and send them into multiple slave node. So this make fast processing using hadoop cluster.

# How Big Data and Hadoop plays major role in MNC's like Facebook

No alt text provided for this image

“Facebook runs the world’s largest Hadoop cluster" says Jay Parikh, Vice President Infrastructure Engineering, Facebook.

Facebook has two main clusters. First one is 1100 node cluster which having 8800 cores CPU and 12 Petabyte Storage. Second one is having 300 node cluster which have 2400 cores CPU and 3 Petabyte Storage.

All these nodes process data in high speed and generate result in small time. Let's understand this with an example :- If anyone shares a post on facebook data will go through all these nodes and stores there and then multiple processing will done like how many likes post receives and what kind of post is it like technology related or political related some machine learning models will run and generates some result like what type of topics user interested according to post all these done in small time maybe in a few sec or a min.

No alt text provided for this image


要查看或添加评论,请登录

Divyansh Rahangdale的更多文章

社区洞察

其他会员也浏览了