How Facebook Stores,Manipulates and Manages  Big Data

How Facebook Stores,Manipulates and Manages Big Data

Facebook data center  near Chicago
Facebook generates 4 petabytes of data per day ,that's a million gigabytes[2020].

Here's how it stores,manages and manipulate big data.

Earlier,Facebook used to store data in 10 I/O operations and it required multiple traversing.In 2010, they implemented a RAID-6 storage service which used single I/O operation for data request.It was a huge problem for traditional data storage system to store so huge data volume

But later they realized that keeping all photos in expensive, fast storage becomes a waste of performance, requiring unnecessary power and cooling.So instead of focusing on high speed they targeted higher efficiency .Thus they redesigned their storage system.

Now there was a warm haystack for warm images which are being viewed and a offline cold storage facility for the ones which are old or stopped getting views,they focused on long durability too as if a photo regain popularity it can be picked out of cold storage until it's popularity fades again. This storage design was highly energy efficient which required no generators,no ups and no redundant power supply.It stores this data in cheap ,low cost storage .It’s also looking at using optical disks as part of its cold storage as it is persistent long lasting and dense.

But to make space for the coming billions of users Facebook requires new ideas and technology, this is where Disaggregated Rack came.Instead of using servers including computing,memory and huge HDD storage , the Distributed File System came in very handy.

Instead of using large computers to store petabytes of data racks of servers are made and the data is distributed in them using cluster topology having one master (name node) connected to thousands of slaves(data nodes).Hadoop is an Apache open source framework used to store large data using clustering where one master can load unload huge data with thousands of slaves in parallel which is fast and easy to maintain.

Facebook generates 4 petabytes of data per day ,that's a million gigabytes. All that data is stored in what is known as the Hive, which contains about 300 petabytes of data.

Hive is an open source, peta-byte scale data warehousing framework based on Hadoop that was developed by the Data Infrastructure Team at Facebook.





要查看或添加评论,请登录

Shashwat Pathak的更多文章

社区洞察

其他会员也浏览了