Practically find how to contribute the specific amount of storage with name node in Hadoop cluster ????
Anushka Visapure
Solution-Oriented DevOps Engineer || Skilled in Kubernetes | Terraform | Ansible | Docker | Git and GitHub | GitHub Action || Expanding Capabilities in AWS | GCP | Linux.
Problem: ??In a Hadoop cluster, find how to contribute a limited/specific amount of storage as a slave to the cluster?
Solution:??????????
what is Hadoop? Hadoop is an intelligent tool which used to manage big data and it is built on top of java Like Hadoop there are many other tools for managing the big data tools like apache spark and Google BigQuery etc
What is Hadoop Architecture? In basic Hadoop Architecture we will have one Name Node which is like a master node, one client from where we upload or read the data, and "n" number of data nodes where the master or Name node only contains metadata about all the data nodes in the cluster and each data node shares some storage with name node and name node will share this details to client and client will do whatever it wants to do either upload data into data nodes or reads the files from the data nodes .etc
This practical we have done in a team where we had 1 slave and 1 master
STEP -1 -Configure the configuration Files in Name Node
Here we have configured the core-site.xml file with IP address with hdfs protocol because internally Hadoop will use the hdfs protocols to transfer the files in this we have allowed all the IP's with port 9001
This we have configured in hdfs-site.xml config files where we have created the directory and in this dir, all the metadata about the data nodes are saved. After successful configuration of the files start the service of name node
Here you can check that name node service is on and it is running successfully and you can check the report of the name node ie how many data nodes are connected to the name node by using the command
" Hadoop dfsadmin -report "
STEP -2 Configuring the same configuration files in data nodes also
This is data node core-site-xml file where we will configure with name node IP address in the same way we have to configure all the data nodes we have in this cluster
Here we have created the directory and data nodes stores the data about the files it has will store the information and sends the heartbeat every three seconds to the name node and same has to be configured to every data node in the cluster
After the successful configuration of configuration files in datanode starts the data node service and check whether the data node process is successfully started or not.
STEP -3: After successfully starting the data node then we have to create the EBS volume of how many storages we required and then we have to connect it with our data node
After that, we have to check the EBS volume has been connected or not by using the command: " fdisk -l "
STEP -4: After connecting the EBS volume we have to create the partition by using the command: " fdisk device_name "
STEP -5: After creating the partition then we have to format it by using the command: " mkfs.ext3 device_name "
STEP -6: Now we have to mount this partition on the drive which we can share with the name node in my case the drive name is /data_node
After that, we have to check partition has been mounted or not by using the command: " df -l " and to see how many storage we have given to the drive we have to use the command: " lsblk "
After completing all above the steps successfully and now we can check in the name node whether the data node has been connected and shared the given storage successfully or not by using the command: "Hadoop dfsadmin -report"
And here we can see the data node share only that much storage which we have given it
THANK YOU FOR READING ????
DevOps, Cloud & Performance Engineer| DevOps Engineer
4 年Well done Anushka Visapure ?
Java|| Python||Linux and Networking||Hadoop ||Ansible || Kubernetes|| Jenkins|| AWS ||Docker||DSA
4 年????Anushka Visapure
DevOps @Forescout ?? | Google Developer Expert | AWS | DevOps | 3X GCP | 1X Azure | 1X Terraform | Ansible | Kubernetes | SRE | Platform | Jenkins | Tech Blogger ??
4 年Nice work Anushka Visapure ??
MTS 1 @Cohesity | Ex-Veritas | Kubernetes | Docker | Golang | Python
4 年Great work Anushka Visapure ?????