"Getting Started with Hadoop on Ubuntu: Installation Made Easy"
Ladli Bhagat
Business Analyst at AGR Knowledge Pvt Ltd | Market Research | Data Analysis | Excel | Python | Power BI
"Unlocking the Power of Big Data: Hadoop Installation on Ubuntu"
In the age of data, the ability to process and analyze vast amounts of information is a game-changer. Whether you're a data scientist, a developer, or simply curious about the potential of big data, Hadoop is a name you've probably heard. It's the open-source framework that powers some of the world's most data-intensive applications, and it can be a crucial tool in your arsenal. But where do you start? How do you bring Hadoop into your world?
Today, I'm here to guide you through the process of installing Hadoop on an Ubuntu system, step by step. By the end of this journey, you'll have Hadoop up and running on your Ubuntu machine, and you'll be well on your way to unlocking the power of big data.
Why Hadoop on Ubuntu?
Ubuntu is a popular and user-friendly Linux distribution, making it an excellent choice for Hadoop installation. The combination of Ubuntu's ease of use and Hadoop's data processing capabilities is a winning formula. With this setup, you'll have the foundation to explore large datasets, build data-driven applications, and derive valuable insights from your data.
Step 1: Install Java Development Kit
The journey begins with installing Java 8, the required version for Hadoop. It's a straightforward process: just run the following command in your terminal:
sudo apt update && sudo apt install openjdk-8-jdk
Step 2: Verify the Java Version
Make sure your Java installation was successful by running:
java -version
Step 3: Install SSH
For secure communication within your Hadoop cluster, SSH is essential. It ensures data integrity and confidentiality while facilitating efficient distributed data processing. Install SSH with:
sudo apt install ssh
Step 4: Create the Hadoop User
All Hadoop components will run under a user account you create for Apache Hadoop. This user will also be used to log in to Hadoop's web interface. Create the user and set a password with:
sudo adduser hadoop
Step 5: Switch User
Switch to the newly created Hadoop user:
su - hadoop
Step 6: Configure SSH for Password-less Access
To streamline your workflow, configure password-less SSH access for your Hadoop user.
ssh-keygen -t rsa
Generated public key to the authorized key file and set the proper permissions:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 640 ~/.ssh/authorized_keys
Step 7: SSH to the Localhost
Authenticate your localhost by adding RSA keys to known hosts:
ssh localhost
Step 8: Switch User
Once again, switch to the Hadoop user:
su - hadoop
Step 9: Install Hadoop
Download Hadoop 3.3.6 and unzip it:
tar -xvzf hadoop-3.3.6.tar.gz
Rename the extracted folder to remove version information. This is an optional step, but if you don’t want to rename, then adjust the remaining configuration paths.
mv hadoop-3.3.6 hadoop
Now, configure Hadoop and Java Environment Variables on your system by editing ~/.bashrc:
nano ~/.bashrc
Append the below lines to the file.
领英推荐
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Load the above configuration in the current environment.
source ~/.bashrc
You also need to configure JAVA_HOME in hadoop-env.sh file. Edit the Hadoop environment variable file in the text editor:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Search for the “export JAVA_HOME” and configure it .
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Step 11: Configuring Hadoop
Create Namenode and Datanode directories in the Hadoop user's home directory, update core-site.xml with your system hostname, and set directory paths in hdfs-site.xml and mapred-site.xml. Finally, configure yarn-site.xml.
cd hadoop/
mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}
Next, edit the core-site.xml file and update with your system hostname:
nano $HADOOP_HOME/etc/hadoop/core-site.xml
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Step 12: Start Hadoop Cluster
Before starting the Hadoop cluster, format the Namenode:
hdfs namenode -format
Once the Namenode directory is successfully formatted, start the Hadoop cluster with:
You can check the status of all Hadoop services using:
jps
To stop you can use this command:
You've now completed the installation of Hadoop on your Ubuntu system. This journey is just the beginning of your exploration of big data. With Hadoop, you're equipped to tackle data challenges, derive insights, and make data-driven decisions.
Overview:
The installation of Hadoop on an Ubuntu system is a multi-step process that begins with crucial tasks such as JDK installation, user creation, SSH configuration, and package download/installation. Once these foundational steps are complete, further configuration involves directory setup, cluster initiation, and validation of functionality. Building the Hadoop environment entails careful consideration of configuration files, formatting keys, and securing connections. The journey culminates in verifying access, exploring the file system via the web interface, and ensuring the seamless operation of the Hadoop cluster for effective data processing.
#Hadoop #BigData #Ubuntu #DataAnalytics #OpenSource
I hope this article helps you kickstart your big data journey with Hadoop on Ubuntu!