AWS Case Study 2 - Pandemic Stats App - 100% serverless

AWS Case Study 2 - Pandemic Stats App - 100% serverless

Here is my recipe for "baking" 100% pure serverless AWS Web App.

You'll need:

  • 12 Amazon WebServices,
  • some dose of patience to setup & integrate them together,
  • basic programming skill to code functions that will handle the data processing,
  • and a budget of roughly 0.6 EUR (yes, just 60 cents) per month to cover its operating costs.

Ready to start cooking?

Let's discuss "The Requirements"

Before we take the deep dive and start designing our Pandemic Stats Web App, let's begin with defining the requirements which our app and its architecture should fulfill.

And since we are in the year 2021, let's make these requirements as extremely demanding as possible:

Our Pandemic Stats Web App should:

  1. provide a chart overview of total number of confirmed cases of COVID-19 in the member states of EU,
  2. be highly available and fault tolerant (with SLA guaranteeing at least 99.9% uptime),?
  3. be running on the backbone delivering ultra-low latency performance for end users no matter how regionally dispersed in the world they are,
  4. be 100% autonomous (automatically should update itself with the latest COVID-19 data, shouldn't require any manual maintenance or supervision),
  5. automatically scale up and scale down based on the spikes in the traffic volume,
  6. not require from us to provision or manage servers to ensure its operation,
  7. not generate operating costs higher than 1 EUR/month (yes, in these COVID19 times everybody has now a tight budget, so do we).?

Pretty tough most of these requirements, aren't they?

Even though this may look like an mission impossible task to achieve (mainly because of this rule of "<1 EUR operating costs"), it is indeed achievable - and I'm going to show you how.

Let's design "The Architecture"

Based on the requirements noted above, we don't have really many architectural choices available.?

Frankly, we have just one choice - to choose the serverless architecture.

And since we are all fans of AWS (I hope so :), we will host our solution on the AWS cloud platform.

The following AWS architecture I designed may at first sight look to you as quite complex.

However, if we want to fulfill all of the requirements and get the most from AWS and pay the least, this doesn't leave to many alternative choices.

Serverless architecture - Pandemic Stats

Let's take a look "Under The Hood"

The solution I designed is powered by 2 Lambda functions and 1 CloudFront distribution.

AWS Lambda logo

The 1st Lambda function - FetchProcessAndStoreCOVID19Data?

It actually does these 2 things:?

a) it executes SQL query via AWS Athena and fetches the total number of confirmed cases of COVID-19 in the member states of EU. AWS Athena retrieves this data from the COVID-19 data lake located at the public Amazon S3 bucket.?

Do you remember my previous article where I wrote about the AWS Public Datasets? COVID-19 Data Lake is one of these free datasets. It is a centralized?repository of up-to-date related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness (COVID-19).???

b) after Lambda function gets the data from Athena, it then parses the SQL query result set it received and pregenerates the HTML content of the webpage that user will see when he visits the PandemicStats.Cloud Web App. This HTML code is then stored in the Amazon DynamoDB table.

As the total number of COVID-19 cases changes every day, we need to make sure our webpage will always display the current figures. That's why we need to execute this Lambda function daily. For this purpose, we will use Amazon Cloudwatch which allows us to setup the scheduled rule that will trigger the execution of Lambda automatically every few hours.?

The 2nd Lambda function - GetWebpageData

Together with the Amazon API gateway acts like a webserver that serves webpage content stored in the Amazon DynamoDB table to the internet browser.?

In this database table we have the HTML content containing the COVID-19 data that our 1st Lambda function pregenerated and we also have there fancy graphics that we display on our webpage.

No alt text provided for this image

The role of Amazon CloudFront

There are 3 reasons why we need to use Amazon CloudFront.

1. The first reason is that all Amazon API Gateway REST endpoints (through which we let internet users access our webpage) are designed by Amazon to only listen to the HTTPS protocol requests.?

If we didn't use CloudFront, visitor of our Pandemic Stats web app would have to always type into their internet browsers full HTTPS:// URL of it.

And you know how it works these days, most internet users (myself included) we skip the protocol part of the URL and type just the hostname part.

Without CloudFront, no page would be displayed in such case and users would think our webpage doesn't work at all.?

As we don't want this to happen, we need to make sure both HTTP and HTTPS URLs work. Using CloudFront, it is fairly easy to configure a forwarding rule that rewrites the HTTP URL to the HTTPS one.

2. The second reason why we need Amazon CloudFront is to ensure that no matter where our webpage visitor in the world is, he has always lightning-fast access to our webpage.?

To ensure this, Amazon CloudFront uses a global network of 225+ Points of Presence (215+ Edge locations and 12 regional mid-tier caches) in 88 cities across 45 countries.?

No alt text provided for this image

In other words, Amazon CloudFront stores a cached version of our webpage in all of its mentioned network nodes in the world and depending on the user's location it delivers to him the webpage from the nearest CloudFront node.?

3. The third reason is that it saves us money because thanks to its caching capabilities there are just 1-2 API Gateway REST API requests made per day including 2nd Lambda function executions no matter how many webpage visits our webpage a day.

Let's discuss "The Costs"

How it is possible that operating costs of PandemicStats.cloud webpage are just around 60 cents?

It is possible because most of the AWS webservices that we use offer Free Tier programme in which you are not billed unless your usage goes over defined limits.

  • AWS Lambda comes with unlimited free tier allowing us to make 1 million free requests per month (or consume up to 3.2 million seconds of compute time per month at no charge) - due to CloudFront caching we will hardly make more than few hundred of requests each month.
  • Amazon CloudWatch comes with unlimited free tier as well which allows us to define 10 free custom metrics and 10 free alarms or make 1 million of free API Requests - defining 1 rule that gets triggered twice a day incurs no costs.
  • Amazon DynamoDB within its unlimited free tier offers 25GB of storage and capacity to make 200 milion of requests per month at no additional cost - storing 60kBs of data is like nothing.
  • Amazon CloudFront has also a free tier, it covers 50 GB of data transfers done monthly. Even though this free tier is valid only for the first 12 months, as our webpage has only 60kBs in size, we would need to have very high number of daily visitors to pay more than few additional cents.
  • Amazon API Gateway similarily like CloudFront offers free tier but just for 1 year (1mil. API request for free), but since we use CloudFront for caching, these few requests done each days, they would also sum up to some ridiculous amount of cents.
  • Amazon Athena doesn't have a free tier, but since we do just few requests per day which makes Athena parse few tens of MBs of data, the resulting costs are rounded to 0 (1TB of crawled data costs $5).

The AWS webservice which charges us some money though each month is Amazon Route 53.?Our monthly operating costs of +/- 51 cents come exactly from this - we use it as our Domain Name Service (DNS) to host PandemicStats.Cloud domain zone.

Frequently asked questions

1. What about the price of the domain name PandemicStats.Cloud itself? For sure its registration and each year's renewal is not free.

Yes, you are right. However, I intentionally didn't calculate the price of the domain into these operating costs of 60 cents, because anyway, no matter where you are running your website, you always have to pay for the domain name.

Amazon charges for the .cloud domain around 25 EUR/per year. If you pick another domain, you may even get lower pricing.

2. When I take a look at the scheme of the architecture I can see there are only 10 AWS webservices mentioned. Where are the other 2 you didn't mention and are they also needed?

Besides those 10 services, we need additional 2. We need them just initially to help us configure the CloudFront and Lambda.

  1. Amazon Identity and Access Management (IAM) - allows you to create and attach Amazon DynamoDB and Amazon Athena security policies to the Lambda functions so that your Lambda functions can access both of them.
  2. AWS Certificate Manager (ACM) - to generate SSL wildcard certificate for Amazon CloudFront to be able to serve website content via the secured HTTPS connection.

3. There is a graphical COVID-19 chart displayed on the PandemicStats.Cloud website. How is this technically possible, who/what draws it? Any other AWS webservice you haven't mentioned yet?

No, it has nothing to do with AWS this time.?

The chart with total number of COVID-19 cases is rendered on the client side using JavaScript and Google Chart API which is a free product of Google.?

No alt text provided for this image

For more information, visit: https://developers.google.com/chart

4. Showing just one COVID-19 metric on the webpage for website visitors is not really a big deal. What's the real value of PandemicStats.cloud website?

Please bear in mind that PandemicStats.Cloud website was built to demonstrate the power of serverless architecture, possibilities of mutual integration of 12 AWS webservices and cost efficiency of such architecture. It is mostly a proof of concept.

Nevertheless, if one is interested, he can upgrade the lambda functions and offer more features on the website for its users. Possibilities are endless, COVID-19 data lake is a very vast data source.

5. In your previous AWS articles, you published entire step by step tutorial for setting up all necessary settings and AWS integrations. Why there are not mentioned in this article?

I originally wanted to, but as 12 AWS webservices are involved in all of this, this article would be ~ 1km long and it would become difficult for anyone to read it through.

However, for those of you who want to try these things by themselves, please get in touch with me via my LinkedIn profile and I will gladly help you set things up.

If I get similar questions from more of you, I will use it as a signal and write another article(s) in which I can explain stuff in more details.


Sachin Mittal

5X Snowflake Advanced Data Engineer, Advanced Architect, Advanced Administrator ,Advanced Data Analyst, Data Superhero, SnowPro Core, SnowPro Certification SME, Oracle , SIEBEL EIM; https:/medium.com/@sachin.mittal04;

4 年

Really great architecture covering multiple services at low cost

要查看或添加评论,请登录

Rastislav Skultety, MBA的更多文章

社区洞察

其他会员也浏览了