Distributed storage cluster and Hadoop
Rupal Singh
Software Engineer @PhonePe | Former SDE Intern'22 @Amazon | Expert at Codeforces | 5 ? at CodeChef | AIR 48 in Google Code Jam To I/O For Women'22 | ICPC Kanpur Regionalist 2022 | PhonePe Tech Scholar'22
Earlier all companies were using RDBMS to store their data in which we can read once and write n number of times .So solution came up as Hadoop .
Distributed Storage
The exponential growth of data volumes demands new storage technology . To manage the data, distributed storage system were introduced . In distributed storage system, data is divided in different blocks and block of data is stored in different virtual or physical machine .Distributed storage system is based on Master- Slave topology.
Hadoop
Hadoop is a software which provide the facility to create master slave topology in distributive storage system. Hadoop is built in java language .To install hadoop software in our system ,we need to first install the JDK(java development kit).Hadoop belongs to Apache community.
Hadoop Cluster
Two terms most frequently used in hadoop cluster are:
- Cluster: Collection of node.
- Node: Process running on virtual or physical machine.
In Hadoop Cluster ,one system is named as a master and other system associated with it is named as a slave.In hadoop, master node is named as Name node and slave node is named as data node.Every node added to the cluster gives the corressponding boost in throughput.Master node is associated with slave node through a protocol called HDFS(Hadoop distributed file system).
Advantage of Hadoop Cluster
- Scalable
- Cost efficient
- Flexible
- Fast
- Resilient to Failure