DevBox on EC2 Virtual Machine : All in one Hadoop Ecosystem Implementation on Web
Soumya Sankar Panda
Data Engineer | AWS | Snowflake | Hadoop | Spark | Docker | Kubernetes | Terraform
Running Big Data Setups like Hadoop , Spark, Hive, Hbase etc. in Virtual Machines Is quite Challenging and Hectic. You should be aware of a lot of things to complete a simple task in hand. I mean why to deal all the stuffs where you can dive deeper with DevBox setup. Its make Your life lot more easier while using Bigdata Tools and Setups . You can use it in your Academic or Personal projects or if your a curious techie who loves to explore new stuffs ,Then DevBox is for you.
Dive into the world of Big Data with Data Fusion - DevBox!
This all-in-one Docker environment brings everything you need for your journey—Java, Python, Spark, Hadoop, Spark SQL, Hive, HBase, MySQL, Sqoop, Kafka, Flume, and more. Plus, it’s packed with Jupyter extensions in Code Server for a seamless experience. Take your Big Data projects with DevBox to the next level .
This Dev Box is created by Nethaji Nirmal . A special shout out to @Nethaji N.
The DevBox and Addressing the issue:
As i mentioned we will use and play with DevBox by Deploying it to The AWS EC2 Linux Machine For the Hadoop stuff. But the Obvious question will be Why we are using DevBox in the First Place. We could just have use any VM (Virtual Machine) that are available to use like the Oracle VM or the Cloudera VM or Just go for the Docker to experiment things.
Yes! WE COULD'VE USED IT BUT....BUT.. There is a catch onto it. So what was the catch ?
The Catch is Storage - Memory - RAM . it requires a lot of space in the RAM to able to basic operation. One would need to have minimum of 16 GB of Disk Space to able to maintain a steady flow of work With there VMs.
If your Configuration less than 16 GB it will create a Storage problem and Due to this you could experience frequent system lag and Hang. For the Docker itself is the same issue . While you use these Big Data Services via Docker it will be complete mess (especially if you are a Windows OS User with less than 16 GB RAM ). It will heat??up your system + the Storage Issues.
Then this Hadoop thing is a legacy system and you every execution will be shell based , we have nothing in Visuals . So why setup of all this big things locally which will eventually cost your PCs Storage and Frequent Breakdowns. To solve this Issue We have DevBox .
In DevBox you have all in one package of all Hadoop related services with also all its Ecosystem and you have Java, Python, Kafka , Spark (Which is Built on top of HDFS -a Hadoop Concept) and MySQL.
DevBox uses the concept of Code space in a way and Docker to fully utilized to spin of the Hadoop system on the web to give up the full power to do Projects at will. No need for any other package installation just typed the command on the terminal and everything will taken care by DevBox.
For the projects perspective you will have a hassle free experience as everything is pre-installed you just have to Execute on terminal and if you stuck at some point go for the official document of that particular service . Its uses the Document commands nothing here is 3rd party.
P.S: If you Stuck anywhere please DM me and I will assist you with that.
Implementation of DevBox on EC2 :
Before i go onto this a quick reminder that to execute this DevBox on EC2 Instances we will go for larger Linux machine : With Ubuntu machine + RAM ?t4g.xlarge. Which will come with cost in Per Hour basis. But that cost is minimal . It will be between 100 rs to 200 rs if you deleted the Instances you created at first place. You can stopped the Instances . If you do not wish to delete the instances it will easily cost 1000-2000 Rs minimum.
But usually that do not happed because we will be deleting all this after successful execution of a codes. If you delete all the instances maximum amount will be 70-80 rs . So keep that in mind.
Its better to pay 70-80 Rs or even less than to use Docker and VM at a cost of you PC or Laptop storage and other configuration issue. That's completely up to you. Here I am Giving You an Alternative to these Old way of Learning Hadoop.
Step by Step Execution :
1) First log into you AWS console with your root account. I specifically asking you do it with your root account other wise you could face Permission error or access denied errors , which will lead to the problem of you will unable to create the EC2 instance at first place.
2) Go to EC2 and Create an Instance With following configurations:
Give appropriate name of the instance and Select Ubuntu OS Image. Next -
3)Then go for the Architecture and Select 64-bit(Arm).
Now select Instant Type : That will be - t4g.xlarge
4) Then go for Network settings:
Allow everything- a) Allow SSH traffic from : default - Anywhere 0.0.0.0/0
b) Allow HTTPS trafiic from the internet
c)Allow HTTP traffic from the internet
5) For the Configure storage go for : 1 x 30 GiB ,which is comes under the free tier and leave everything by default.
6) Upon submitting , we will be able to create the EC2 Instance
7) Now go to the Instances and now you be able to see that our instace has been created and running.
(Sorry Here i terminated the instance instead stooping it , i will update it)
8) Go to the instances id of the instance we created and press the connect button available on the top of the Right hand side ??
9) After enter the connect button you will be land to this page and keep everything by defalut in this page and go for the final connection by hitting Connect. It will land you to the Ubuntu Linux terminal . Then we go for the DevBox setup ??onwords.
10) Congratulations ?? You are now on Linux Terminal . Now we will go for the DevBox setup on Cloud. Feel Proud you made it so far ?? ??.
keep it up ..you are just few step way...don't loose focous.
11) Now we write the command for the DevBox setup and all the Docker setup and other configurations.
first type the command :
sudo yum update -y or sudo apt-get update -y (go for : sudo apt-get update -y)
wait for the command to be fully execute and available the terminal for next command.
12) Now go for the Next command :
?sudo apt-get install docker.io -y ( or you could try :sudo yum install -y docker) But go for the first command .
13) Check for Docker : docker --version (check installation)
14) Then go for next command :
Do the command execution One by one .
15) Next Step will be :
Exit and re connect for it to take action - Close the terminal and go to the Instances ID and Connect again to the terminal to be able to go for further execution.
Execute these steps One by one in ordered.
16) Onto the next Commands:
Execute in Ordered manner one at a time.
17) Now for the Docker pull command for the DevBox .
Command :
docker pull mentornirmal/devbox-coder:latest
18) Now go Onto next command : To pull the MySQL
command : docker pull mentornirmal/mysql:8.0
Then go for the next command to be execution:
领英推荐
Make a folder in the current directory : mkdir <file name> ( for me here it is : mkdir sam) and see if it is created or not by the 'ls' command .
19) Next Execute the command : nano docker-compose.yml
Inside this go and paste the following :
insert this code by command : shift + insert key
version: "3.8"
services:
??coder:
????image: mentornirmal/devbox-coder:latest
????ports:
??????- "8080:8080"? ? # Port mapping for code-server
??????- "9870:50070"? ? # HDFS NameNode
??????- "9864:50010"? ? # HDFS DataNode
??????- "8042:8042"? ? # YARN NodeManager
??????- "8088:8032"? ? # YARN ResourceManager
????volumes:
??????- /home/ec2-user/nirmal:/home/coder/project
????depends_on:
??????- mysql
??mysql:
????image: mentornirmal/mysql:8.0??
????environment:
??????MYSQL_ROOT_PASSWORD: rootpassword
??????MYSQL_USER: coder
??????MYSQL_PASSWORD: coderpassword
????ports:
??????- "3306:3306"
????volumes:
??????- mysql_data:/var/lib/mysql
????restart: always
volumes:
??mysql_data:
In the volume part go an edit certain things:
- /home/ec2-user/nirmal:/home/coder/project
instead ec2-user write ubuntu
and instead of nirmal write your file that you created ( for me it sam) by the command mkdir <file name>
Do everything with crossers .
To save this code execute : ctrl + X. Then Type Y for confirmation and you good to go.
20) For the final part of Execution , we have up the docker .
command : docker-compose up -d
N.B :
Incase your docker is not running properly , down the docker the up the docker
wait i will show you:
first: docker-compose down
then: docker-compose up -d
Now the Most difficult part is done ??. Now its Time for the Real action : DevBox in the Web !!!
Here on we go to the instance ID ----> onto Security -----> On Security go For Security Groups ----> onto Edit Inbound Rules -----> then Adding the Port 8080 for DevBox -----> Back to instance Id -----> Lunch the Port URL ----> now Edit the URL to https://13.60.4.114:8080 .
You will get your own port.
In edit Inbound Rules Add the Post 8080 and map it the Universal access port 0.0.0.0/0 and save it.
why map it to 0.0.0.0/0? -- because 0.0.0.0/0. Allows outbound HTTP access to any IPv4 address.
Just remove the "s" from https:// , because in https:// -- 's' stands for secure but in this our URL is not secure . so without removing it you can not accesses the DevBox in Web.
Then ?? walhaaa.... ?? You got the DevBox Running ??folks....
Allow The Trust Policy (Allow Trust the Authers) and There you Have it.
Now the fun Part :
Type jps on the command prompt : what will you ?? see .
The list of services are provided to you by DevBox and Its all running perfectly
What is jps? what it role? :
jps = stands for Java Virtual Machine Process Status Tool.
It's a command-line utility that lists the running Java processes on your system, along with their process IDs and other relevant information.
Now a small ?? thing to add here.
Get python interactive mode by executing python in terminal.
We have Run One more command to Run our scripts in DevBox here. We have to give accesses to create , edit and run the scripts here itself .
For that the command will be : sudo chown -R coder:coder /home/coder/project
This is for to create and play with files.
The it will ask for password.
Password = codeserver (Give Password as codeserver) and things will be smooth ?? ??
Additionally Cheek these command inthe terminal:
GitHub Link For DevBox ?? :
Go and Check and give me feedback and comment me how far you execute it and did you successful on it. If you stuck anywhere just remember reach out to me and follow th instructions .
If you stuck : before reaching me out stop the EC2 Instance or Completely Delete it for safety. Otherwise you will be Charged By AWS ??????
Thank You !!!
Data Analyst | AI | B.Tech Computer Science & Engineering
1 周Hi, All configuration done , but portal not showing up