DevBox on EC2 Virtual Machine : All in one Hadoop Ecosystem Implementation on Web
( Hadoop + DevBox )

DevBox on EC2 Virtual Machine : All in one Hadoop Ecosystem Implementation on Web

Running Big Data Setups like Hadoop , Spark, Hive, Hbase etc. in Virtual Machines Is quite Challenging and Hectic. You should be aware of a lot of things to complete a simple task in hand. I mean why to deal all the stuffs where you can dive deeper with DevBox setup. Its make Your life lot more easier while using Bigdata Tools and Setups . You can use it in your Academic or Personal projects or if your a curious techie who loves to explore new stuffs ,Then DevBox is for you.


Dive into the world of Big Data with Data Fusion - DevBox!

This all-in-one Docker environment brings everything you need for your journey—Java, Python, Spark, Hadoop, Spark SQL, Hive, HBase, MySQL, Sqoop, Kafka, Flume, and more. Plus, it’s packed with Jupyter extensions in Code Server for a seamless experience. Take your Big Data projects with DevBox to the next level .

This Dev Box is created by Nethaji Nirmal . A special shout out to @Nethaji N.


The DevBox and Addressing the issue:

As i mentioned we will use and play with DevBox by Deploying it to The AWS EC2 Linux Machine For the Hadoop stuff. But the Obvious question will be Why we are using DevBox in the First Place. We could just have use any VM (Virtual Machine) that are available to use like the Oracle VM or the Cloudera VM or Just go for the Docker to experiment things.

Yes! WE COULD'VE USED IT BUT....BUT.. There is a catch onto it. So what was the catch ?

The Catch is Storage - Memory - RAM . it requires a lot of space in the RAM to able to basic operation. One would need to have minimum of 16 GB of Disk Space to able to maintain a steady flow of work With there VMs.

If your Configuration less than 16 GB it will create a Storage problem and Due to this you could experience frequent system lag and Hang. For the Docker itself is the same issue . While you use these Big Data Services via Docker it will be complete mess (especially if you are a Windows OS User with less than 16 GB RAM ). It will heat??up your system + the Storage Issues.

Then this Hadoop thing is a legacy system and you every execution will be shell based , we have nothing in Visuals . So why setup of all this big things locally which will eventually cost your PCs Storage and Frequent Breakdowns. To solve this Issue We have DevBox .

In DevBox you have all in one package of all Hadoop related services with also all its Ecosystem and you have Java, Python, Kafka , Spark (Which is Built on top of HDFS -a Hadoop Concept) and MySQL.

DevBox uses the concept of Code space in a way and Docker to fully utilized to spin of the Hadoop system on the web to give up the full power to do Projects at will. No need for any other package installation just typed the command on the terminal and everything will taken care by DevBox.

For the projects perspective you will have a hassle free experience as everything is pre-installed you just have to Execute on terminal and if you stuck at some point go for the official document of that particular service . Its uses the Document commands nothing here is 3rd party.


P.S: If you Stuck anywhere please DM me and I will assist you with that.


Implementation of DevBox on EC2 :


Before i go onto this a quick reminder that to execute this DevBox on EC2 Instances we will go for larger Linux machine : With Ubuntu machine + RAM ?t4g.xlarge. Which will come with cost in Per Hour basis. But that cost is minimal . It will be between 100 rs to 200 rs if you deleted the Instances you created at first place. You can stopped the Instances . If you do not wish to delete the instances it will easily cost 1000-2000 Rs minimum.

But usually that do not happed because we will be deleting all this after successful execution of a codes. If you delete all the instances maximum amount will be 70-80 rs . So keep that in mind.

Its better to pay 70-80 Rs or even less than to use Docker and VM at a cost of you PC or Laptop storage and other configuration issue. That's completely up to you. Here I am Giving You an Alternative to these Old way of Learning Hadoop.


Step by Step Execution :

1) First log into you AWS console with your root account. I specifically asking you do it with your root account other wise you could face Permission error or access denied errors , which will lead to the problem of you will unable to create the EC2 instance at first place.

2) Go to EC2 and Create an Instance With following configurations:

Give appropriate name of the instance and Select Ubuntu OS Image. Next -

Creating Instances By Giving Name and Selecting OS

3)Then go for the Architecture and Select 64-bit(Arm).

Now select Instant Type : That will be - t4g.xlarge

selecting architecture and Instance type


4) Then go for Network settings:

Allow everything- a) Allow SSH traffic from : default - Anywhere 0.0.0.0/0

b) Allow HTTPS trafiic from the internet

c)Allow HTTP traffic from the internet

The Network Settings


5) For the Configure storage go for : 1 x 30 GiB ,which is comes under the free tier and leave everything by default.

Configuring the Storage


6) Upon submitting , we will be able to create the EC2 Instance

( Instance Created )


7) Now go to the Instances and now you be able to see that our instace has been created and running.

(Sorry Here i terminated the instance instead stooping it , i will update it)

( EC2 Instance in Running Mode )


8) Go to the instances id of the instance we created and press the connect button available on the top of the Right hand side ??

Connect to Lunch the VM


9) After enter the connect button you will be land to this page and keep everything by defalut in this page and go for the final connection by hitting Connect. It will land you to the Ubuntu Linux terminal . Then we go for the DevBox setup ??onwords.

Final connect to Linux VM and accessing the Terminal for DevBox Setup


10) Congratulations ?? You are now on Linux Terminal . Now we will go for the DevBox setup on Cloud. Feel Proud you made it so far ?? ??.

keep it up ..you are just few step way...don't loose focous.

Linux Ternimal


11) Now we write the command for the DevBox setup and all the Docker setup and other configurations.

first type the command :

sudo yum update -y or sudo apt-get update -y (go for : sudo apt-get update -y)

wait for the command to be fully execute and available the terminal for next command.


12) Now go for the Next command :

?sudo apt-get install docker.io -y ( or you could try :sudo yum install -y docker) But go for the first command .

13) Check for Docker : docker --version (check installation)

14) Then go for next command :

Do the command execution One by one .

  • sudo service docker start (Start the Docker service)
  • sudo usermod -aG docker $USER (Add the user to the docker group)

15) Next Step will be :

Exit and re connect for it to take action - Close the terminal and go to the Instances ID and Connect again to the terminal to be able to go for further execution.

Execute these steps One by one in ordered.

16) Onto the next Commands:

  • sudo chmod +x /usr/local/bin/docker-compose (apply executable permissions to the binary)
  • docker-compose --version

Execute in Ordered manner one at a time.

17) Now for the Docker pull command for the DevBox .

Command :

docker pull mentornirmal/devbox-coder:latest

18) Now go Onto next command : To pull the MySQL

command : docker pull mentornirmal/mysql:8.0

Then go for the next command to be execution:

Make a folder in the current directory : mkdir <file name> ( for me here it is : mkdir sam) and see if it is created or not by the 'ls' command .

19) Next Execute the command : nano docker-compose.yml

Inside this go and paste the following :

insert this code by command : shift + insert key

version: "3.8"

services:

??coder:

????image: mentornirmal/devbox-coder:latest

????ports:

??????- "8080:8080"? ? # Port mapping for code-server

??????- "9870:50070"? ? # HDFS NameNode

??????- "9864:50010"? ? # HDFS DataNode

??????- "8042:8042"? ? # YARN NodeManager

??????- "8088:8032"? ? # YARN ResourceManager

????volumes:

??????- /home/ec2-user/nirmal:/home/coder/project

????depends_on:

??????- mysql

??mysql:

????image: mentornirmal/mysql:8.0??

????environment:

??????MYSQL_ROOT_PASSWORD: rootpassword

??????MYSQL_USER: coder

??????MYSQL_PASSWORD: coderpassword

????ports:

??????- "3306:3306"

????volumes:

??????- mysql_data:/var/lib/mysql

????restart: always

volumes:

??mysql_data:

In the volume part go an edit certain things:

- /home/ec2-user/nirmal:/home/coder/project

instead ec2-user write ubuntu

and instead of nirmal write your file that you created ( for me it sam) by the command mkdir <file name>

Do everything with crossers .

To save this code execute : ctrl + X. Then Type Y for confirmation and you good to go.


20) For the final part of Execution , we have up the docker .

command : docker-compose up -d

N.B :

Incase your docker is not running properly , down the docker the up the docker

wait i will show you:

first: docker-compose down

then: docker-compose up -d


Docker container for DevBox is Running

Now the Most difficult part is done ??. Now its Time for the Real action : DevBox in the Web !!!

Here on we go to the instance ID ----> onto Security -----> On Security go For Security Groups ----> onto Edit Inbound Rules -----> then Adding the Port 8080 for DevBox -----> Back to instance Id -----> Lunch the Port URL ----> now Edit the URL to https://13.60.4.114:8080 .

You will get your own port.

(Security Groups )

In edit Inbound Rules Add the Post 8080 and map it the Universal access port 0.0.0.0/0 and save it.

why map it to 0.0.0.0/0? -- because 0.0.0.0/0. Allows outbound HTTP access to any IPv4 address.

(Edit Inbound Rules)


Just remove the "s" from https:// , because in https:// -- 's' stands for secure but in this our URL is not secure . so without removing it you can not accesses the DevBox in Web.


(After adding port 8080 lunched it to the web)

Then ?? walhaaa.... ?? You got the DevBox Running ??folks....

Allow The Trust Policy (Allow Trust the Authers) and There you Have it.


The DevBox

Now the fun Part :

Type jps on the command prompt : what will you ?? see .

The list of services are provided to you by DevBox and Its all running perfectly

What is jps? what it role? :

jps = stands for Java Virtual Machine Process Status Tool.

It's a command-line utility that lists the running Java processes on your system, along with their process IDs and other relevant information.

Execute jps in terminal to get the name of the started daemons


Now a small ?? thing to add here.

Get python interactive mode by executing python in terminal.

We have Run One more command to Run our scripts in DevBox here. We have to give accesses to create , edit and run the scripts here itself .

For that the command will be : sudo chown -R coder:coder /home/coder/project

This is for to create and play with files.

The it will ask for password.

Password = codeserver (Give Password as codeserver) and things will be smooth ?? ??

(Activate the DevBox for your custom file to run)

Additionally Cheek these command inthe terminal:

  • Check the version of java by executing java -version in terminal
  • Check the version of hadoop by executing hadoop version in terminal
  • Check the version of sqoop by executing sqoop version in terminal
  • Check the version of hbase by executing hbasae version in terminal
  • Check the version of hive by executing hive --version in terminal

GitHub Link For DevBox ?? :

https://github.com/nethajinirmal13/DataFusion-DevBox

Go and Check and give me feedback and comment me how far you execute it and did you successful on it. If you stuck anywhere just remember reach out to me and follow th instructions .

If you stuck : before reaching me out stop the EC2 Instance or Completely Delete it for safety. Otherwise you will be Charged By AWS ??????


Thank You !!!


Sana Ismail

Data Analyst | AI | B.Tech Computer Science & Engineering

1 周

Hi, All configuration done , but portal not showing up

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了