Deploy a Python Dashboard on AWS
Sreejith Munthikodu
Senior Data Scientist and Data Architect at BC Public Service
Recently, I created an interactive covid-19 dashboard in Python using plotly dash. I would like to share the steps I followed to get the app running on an AWS EC2 instance. I also scheduled the EC2 instance to fetch up to date data from the data source.
Data Source: CSSE @ Johns Hopkins University
The app
Plotly dash is an opensource framework to build enterprise-ready analytic web apps without having to write javascript code. It empowers data analysts and data scientists to publish their dashboards and data analytics products without having to worry about the complex tasks involved in developing dynamic web apps. Dash supports both Python and R. If you want to quickly learn how to use dash, please refer to this tutorial on the official documentation.
I used dash to build the covid-19 dashboard that I used in this article. Data for the app is obtained from the popular COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The data was cleaned in Python using Pandas and numpy. Plots were created using Plotly Express in Python. This is how the final dashboard looks like.
Deployment
Create an EC2 Instance
The app was created for learning purposes. So I wanted to use a free service to deploy it. Since I am on AWS free tier period, I decided to go with AWS. I used only resources that are eligible for the free tier in this project. I assume you have an AWS account set up already.
- Create an EC2 t2.micro instance as the server for this web app. From the AWS management console, under services, click on EC2.
- Click Instances and then Launch Instance
- Select Ubuntu Server 18.04 LTS as the Amazon Machine Image (AMI)
- Select type t2.micro, which is free tier eligible. Click Review and Launch and Launch the instance.
- For connecting the instance securely, create a new key pair and download the private key file. Keep this file safe as this will enable anyone to connect the EC2 instance.
- Launch Instance
- If you go back to the EC2 service, under Instances, you will find the new instance running. This will be the server for our app.
Configure Inbound Rules
- It is safer to restrict the access to the EC2 instance only to our IP address. Click on the instance that is running. Under description -> Security Groups, click on launch-wizard-1.
- Click on Inbound Rules -> Edit Inbound Rules
- For Port 22, select My IP as the source. This ensures that only your IP can remotely connect to the EC2 instance.
- We need to open port 80 to enable users to access the app via web. Click Add Rule and add 80 under Port Range. Select Anywhere under Source. Do the same for port 8050 or for whichever port you are planning to run the app on.
Copy the Project to AWS S3 Bucket
- Install AWS CLI. This is used to interact with the AWS console from the command line. Follow the instructions here.
- Follow instructions here to get an AWS access key.
- Configure AWS CLI by typing `aws configure` from your command line
- Provide the Access Key Id, Access Key, Default Region Name you obtained in step 2. You may leave the default output format.
- Now create an AWS S3 Bucket to store the project by typing `aws s3 mb s3://bucket-name`.
- Copy the files from your project to the S3 bucket by typing `aws s3 cp <your directory path> s3://<your bucket name> --recursive`.
aws configure aws s3 mb s3://bucket-name aws s3 cp <your directory path> s3://<your bucket name> --recursive
Connect to the EC2 Instance
- Right-click on the running EC2 instance on the AWS management console and click connect.
- Follow the instructions to connect to the EC2 instance remotely from your command line. For Ubuntu, using the example command would connect to the EC2.
Install Dependencies
- Once in the EC2 instance, you need to install the dependencies to run our Python app. The project root directory has a requirements.txt file.
- Install pip3 and required dependencies using the below commands
sudo apt-get update sudo apt-get -y install python3-pip pip3 install -r requirements.txt
Run the app on EC2
- Follow the instructions here to enable S3 access from EC2 instance
- Copy the project directory from AWS S3 to the EC2 instance
- CD to the project directory and run the app. You should use `screen` to start a detached terminal to run the app so that you can close the connection to the EC2 instance without killing the app. The app should be running on localhost now.
aws s3 sync <local directory path> s3://source-bucket-name cd <Project Directory> python3 app.py
If you are not planning the attach the web app to a domain name, you need to tell the dash web app server to run on 0.0.0.0 instead of localhost. This is to ensure that the app can be accessed from anywhere by https://EC2 IP:PORT. This can be done by editing the app.py script.
app.run(host='0.0.0.0', port=8050)
You will now be able to access the interactive web app with http:EC2 IP:PORT.
Enable Automatic Data Update
The data source for this web app is the well known COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The source is updated once every day with summary data from around the world. So we need to configure our EC2 instance to download the data every day to the project directory and restart the web app. Firstly, I cloned the source repo to the EC2 instance. Then I used a bash script to automate pulling the data from the source repository, moving it to the project directory, and restarting the web app. Then crontab is used to schedule running this bash script every day at around 00:00 hours UTC. The bash script I used is given below
#!/bin/bash # Change directory to JohnHopkins github repo cd <Path to source repo in your EC2 instance> # Update repo sudo git pull # Remove old data from current project cd <Project root directory> sudo rm -rf data/csse_covid_19_time_series* # Move updated data to project directory sudo cp -a <Path to csse_covid_19_time_series on source repository> <Path to data directory in project directory> sleep 3s # Kill the dash app sudo killall screen # Restart the dash app sudo screen -d -m python3 <Path to app.py>
You may schedule to run this script multiple times a day around UTC 00:00 hours. The data source is usually updated around this time.
crontab -e
15 00 * * * sudo bash <Path to the bash script> 15 01 * * * sudo bash <Path to the bash script> 15 02 * * * sudo bash <Path to the bash script> 15 03 * * * sudo bash <Path to the bash script>
I registered a domain name using AWS ROUTE 53. The app is routed to the new domain name using nginx. I used the answer here to configure nginx.
Disclaimer
This is a project I did to learn how to build a dash app and deploy it on AWS. This may not be the best approach for this application. I take no responsibility if the steps mentioned above lead to compromising your AWS account security. Also, I assume the user is in AWS free tier period. Keeping the EC2 instance running beyond the free tier limit may incur charges.
Data Scientist
1 年Can I connect to you?
Associate Director at ISS | Institutional Shareholder Services
1 年Nice article!!
Traversing between data and investment
1 年this is great. thanks for sharing. one question: do you have to run this everytime the instance is activated? or this can be done by bootstrap (EC2 user data)? sudo apt-get update sudo apt-get -y install python3-pip pip3 install -r requirements.txt
Data Engineer at Porto
2 年Yago Battaggia
MD, PhD
3 年Hey Sreejith, thank for the post! Can you fix the links. The link to the dashboard and the link to the nginx tutorial are not working. Thanks!