Hadoop: Contributing specified storage on Data Node to the Cluster
Ankit Kumar
Platform Engineer @ Brevo | Kubernetes | Python | Linux | Cloud | RHCE | RHCSA
Many times while creating Hadoop cluster, a condition arises where we don't want to contribute the entire storage available on Data Node. So, today we'll be seeing how to contribute specified space on the Data Node to the Name Node or the cluster. We'll be achieving this with the help of partitions.
Prerequisites: Here, for demonstration I've already configured a Name Node and a Data Node over an AWS EC2 instance. Also, I have created and attached an EBS volume of 1GB.
You can see in the image below that the EBS volume (i.e /dev/xvdf) isn't mounted.
Let's say that we only want to contribute 512MB of the attached EBS volume to the cluster. So, we'll create a partition of that size and contribute it's space to the cluster.
Step 1: Creating Partition
cmd# fdisk /dev/xvdf
Now you can see that the new partition has appeared as /dev/xvdf1
Step 2: Formatting the partition
cmd# mkfs.ext4 /dev/xvdf1
We'll have to format the partition with the above command so that we can store data on it.
Step 3: Mounting the partition
cmd# mkdir /dn1 cmd# mount /dev/xvdf1 /dn1
Now, we'll have to mount the partition to a directory that we'll be dedicating to the cluster.
Now you can see that the partition has been mounted over to /dn1 and is ready for use. I had already configured the hdfs-site.xml to directory /dn1.
So, now we'll start the Data Node.
cmd# hadoop-daemon.sh start datanode
You can confirm weather the Data Node has been started by using the jps command.
cmd# hadoop dfsadmin -report
Finally, by using the above command you can see that no more than the allocated space is being used.
Thanks...
Hope you enjoyed it...
See you again...!!!