How Facebook manages their huge data :
?
1). Where they store this data?
------- Obviously they store their data in the servers in data centers . A combination of servers is known as a rack. A single server have a capacity to store peta bytes of data.
To stop the overheating in these racks Facebook uses a seven room rooftop natural air conditioning system.
To maintain 24 hrs power supply they use approx 40 generators each producing 3 mega watt electricity.
2). Which technology Facebook use to maintain their data?
-------- Hadoop
“Facebook runs the world’s largest Hadoop cluster" says Jay Parikh, Vice President Infrastructure Engineering, Facebook..
Basically, Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes.
Facebook Messenger, based on Hadoop database, i.e., Apache HBase, which has a layered architecture that supports plethora of messages in a single day.
----------Scuba
With a huge amount of unstructured data coming across each day, Facebook slowly realized that it needs a platform to speed up the entire analysis part. That’s when it developed Scuba, which could help the Hadoop developers dive into the massive data.
-----------Cassandra
The traditional data storage started lagging behind when Facebook's search team discovered an Inbox Search problem.
The challenge was to develop a new storage solution, that's why Prashant Malik and Avinash Lakshman started developing Cassandra.
---------- Hive
After Yahoo implemented Hadoop for its search engine, Facebook thought about empowering the data scientists so that they could store a larger amount of data in the Oracle data warehouse. Hence, Hive came into existence.
-----------Prism
Initially when Facebook implemented Hadoop, it was not designed to run across multiple data centers. And that’s when the requirement to develop Prism was felt by the team of Facebook.
Prism is a platform which brings out many namespaces instead of the single one governed by the Hadoop. This in turn helps to develop many logical clusters.
This system is now expandable to as many servers as possible without worrying about increasing the number of data centers.
Note: There is a master - slave topology used in Hadoop to store the such huge amount data by splitting them into blocks.