HDFS Clustering Through Docker in CentOS
This guide will show you how to deploy a distributed file system for Hadoop (HDFS). We will first make a docker image and according to that image, we will make 1 NameNode (Master) and 3 DataNodes (Slaves).
Docker Image Generation
Run the command below to create a centos container to install HDFS.
docker run -d -t --privileged --network host --name hdfs centos:7 /sbin/init
docker exec -it hdfs bash
Make an install_hdfs.sh file and add the below contents.
# Install Java and Needed Packages
yum update -y
yum install wget -y
yum install vim -y
yum install openssh-server openssh-clients openssh-askpass -y
yum install java-1.8.0-openjdk-devel.x86_64 -y
# Make Keys so nodes can communicate without requesting password
ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh-keygen -f /etc/ssh/ssh_host_rsa_key -t rsa -N ""
ssh-keygen -f /etc/ssh/ssh_host_ecdsa_key -t ecdsa -N ""
ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -t ed25519 -N ""
# Make a Directory where hadoop will be located
mkdir /hadoop_home
cd /hadoop_home
wget <https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz>
tar -xvzf hadoop-2.7.7.tar.gz
# Add all this environment variables to the ~/.bashrc file
echo "export JAVA_HOME=\\$(readlink -f /usr/bin/javac | xargs dirname | xargs dirname)" >> ~/.bashrc
echo "export HADOOP_HOME=/hadoop_home/hadoop-2.7.7" >> ~/.bashrc
echo "export HADOOP_CONFIG_HOME=\\$HADOOP_HOME/etc/hadoop" >> ~/.bashrc
echo "export PATH=\\$PATH:\\$HADOOP_HOME/bin" >> ~/.bashrc
echo "export PATH=\\$PATH:\\$HADOOP_HOME/sbin" >> ~/.bashrc
# Update Map Reduce Setting File
cp /hadoop_home/hadoop-2.7.7/etc/hadoop/mapred-site.xml.template /hadoop_home/hadoop-2.7.7/etc/hadoop/mapred-site.xml
sed -i '/<\\/configuration>/i \\
<\\/property>' /hadoop_home/hadoop-2.7.7/etc/hadoop/mapred-site.xml
# Update HDFS Setting File
sed -i '/<\\/configuration>/i \\
<\\/property>' /hadoop_home/hadoop-2.7.7/etc/hadoop/hdfs-site.xml
# Make Directories for Master, Slaves nodes, and temporary files
mkdir /hadoop_home/temp
mkdir /hadoop_home/namenode_dir
mkdir /hadoop_home/datanode_dir
Run the command below to install all the packages
chmod +x install_hdfs.sh && ./install_hdfs.sh
source ~/.bashrc && hadoop namenode -format
Exit the container and commit it as an image
docker commit hdfs centos:hdfs
Set your Cluster
After the container is made as an image, we must create our cluster environment next:
sudo docker run -it -h nn --restart always --privileged=true --tmpfs /run --name nn -p 50070:50070 centos:hdfs
sudo docker run -it -h dn1 --restart always --privileged=true --tmpfs /run --name dn1 --link nn:nn centos:hdfs
sudo docker run -it -h dn2 --restart always --privileged=true --tmpfs /run --name dn2 --link nn:nn centos:hdfs
sudo docker run -it -h dn3 --restart always --privileged=true --tmpfs /run --name dn3 --link nn:nn centos:hdfs
We need to extract the IP addresses of the next containers. After extracting them, you need to add them to the /etc/hosts file in the NameNode container.
docker inspect nn | grep IPAddress \\
; docker inspect dn1 | grep IPAddress \\
; docker inspect dn2 | grep IPAddress \\
; docker inspect dn3 | grep IPAddress
The IP addresses below are examples of the content that should be in the file.
echo " dn1" >> /etc/hosts
echo " dn2" >> /etc/hosts
echo " dn3" >> /etc/hosts
Also, make sure to add the slaves in the slaves file. The exact address of the file should be $HADOOP_CONFIG_HOME/slaves
echo "dn1" >> $HADOOP_CONFIG_HOME/slaves
echo "dn2" >> $HADOOP_CONFIG_HOME/slaves
echo "dn3" >> $HADOOP_CONFIG_HOME/slaves
After following all the steps above, you can start hadoop with the next command.
Thanks for reading this guide and if you like these kinds of content, don't forget to follow me for more. Thank you :D