Hadoop Cluster using Ansible
Big Data :-
Today in this growing world we need speed and data to predict and manage the desired outcomes. So large companies like Facebook and Google generate huge amount of data and manage them.So to manage these huge data we need some tools so that it is easy to use and can be easy implemented so for that Hadoop comes into play.
Hadoop :-
It is a open-source software made for reliable and scalable distributed system. Hadoop software library is a framework that allows for distributed processing of large data sets across the clusters of computer using simple programing models. That's all to be needed for now.
Ansible :-
Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.
Let's follow this step to configure the hadoop using ansible
For using ansible for configuration we need 1 os for namenode (namenode works as the os conf with ansible for performing and config the systems where we want to perform some operation) and 2nd for target ndoe (target node works as the os where we want to do some operations)
Datanode ip :- (192.168.43.186)
Namenode ip :- (192.168.43.41)
STEP 2 :- (To configure hadoop we need to install its dependencies like JDK and apache hadoop)
to install jdk and hadoop system we need to first install the software on both os (i.e namenode and datanode or target node). We need to write the following code in the task to install the softwares. I am using redhat commands that is why i use the condition to run only on the specified os.
so to complete the configuration throug anisble we are going to write the ansible-playbook in ".yml file". here is code for that
to run this file we need to save the file in .yml file . I have saved as hadoop.yml file
now let's run this and see how it goes.
As in the above hadoop and jdk were already installed previously if I have ran it obviously it will throw error so I used ignore_errors to move forward.
So at last all run successfully and we are good to go.
Thankyou..
Great job on automating big data Hadoop with Ansible! Your attention to detail in using ansible-playbook and YML file is impressive. To take your skills even further, you might want to explore automating other technologies with Ansible or dive deeper into big data analytics. What other areas in tech are you interested in exploring? Your initiative is a huge step towards becoming a big data engineer. Have you thought about the specific industries or projects you'd like to work on in the future?