AWS Case Study 7 - Multi Region Website Monitoring

AWS Case Study 7 - Multi Region Website Monitoring

My very first AWS case study which I published last year was about building a simple website monitoring service using 3 AWS serverless webservices.

This service had one drawback though - in some cases it could provide false alarms. That was because the verification of website's availability was done by AWS Lambda function located in just ONE region of the world.

And as you know, internet consists of millions of networks and network nodes and malfunction of the network connectivity somewhere in between our monitoring node and the target website does not necessarily mean that our website must be down as well.

AWS regions (as of Oct 11th 2021) - https://aws.amazon.com/about-aws/global-infrastructure/

Advanced (Multi Region) Website monitoring

So what do we need to do to get rid of potential false alarms and to make our monitoring service more reliable?

We need to upgrade the monitoring service's architecture in a way that it can perform website verifications from multiple regions of the world in parallel.

And only if all verifications from all of these regions fail, we are going to assume that the target website went really down.

Let's do it!

High level architecture and its 3 layers

In order to build multi region monitoring solution, we are going to need the following 9 AWS services:

No alt text provided for this image

These webservices will be used across the following 3 layers:

  1. business layer - will ensure that the target website is checked from 5 regions of the world in regular intervals and store results of these checks in the database,
  2. notification layer - in regular intervals will compare results of website checks from multiple regions stored in the database and will send out e-mail notifications in case of target website is down (not reachable from all 5 regions),
  3. presentation layer - will make results of website checks available on the public facing URL endpoint so that any user can view the details of the last run of checks via internet browser.

Business layer

Advanced Website Monitoring - Business layer

/click here to download the business layer architecture in higher resolution/

Notes:

  • to ensure that our business workflow runs every 5 minutes, we will set up so called scheduled pattern rule in Amazon EventBridge (works similarily like a traditional CRON job executed at specific intervals),
  • to orchestrate execution of Lambda functions in 5 different regions in parallel, we will use Amazon SNS (SNS will be triggered by Amazon EventBridge),
  • AWS Lambda function running at each of the 5 AWS regions will be nothing else than a very simple Python script. It will perform the following: fetch the data from Amazon DynamoDB to figure out which website should be checked and to know which string should be present on the target website to indicate website is up, try to establish HTTPS connection to the target website, check if it is running by verifying if required string is present in the response received from target website and finally update Amazon DynamoDB table with results of this verification.
  • in order to persist the results of the website check carried out in the 5 regions, we will use Amazon DynamoDB database table which will have the global table feature enabled (in other words we will setup database table replicas in the remaining 4 regions).

Notification layer

Advanced Website Monitoring - Notification layer

/click here to download the notification layer architecture in higher resolution/

Notes:

  • notification layer architecture also uses: Amazon EventBridge, AWS Lambda, Amazon DynamoDB and Amazon SNS,
  • what is different is that Amazon SNS will be triggered only if all target website checks fail in all of the 5 AWS regions,
  • when Amazon SNS gets triggered (AWS Lambda function publishes a message into Amazon SNS topic), notifications will be sent to all e-mails of SNS topic subscribers.

Presentation layer

Advanced Website Monitoring - Presentation layer

/click here to download the presentation layer architecture in higher resolution/

Notes:

  • Amazon API Gateway is used to create a public facing REST endpoint,
  • Amazon CloudFront takes care of handling http/https requests (bridges our website monitoring domain name with the mentioned public facing REST endpoint) + it also provides the web page caching,
  • when user visits website monitoring URL via internet browser, AWS Lambda function is triggered and will upon its execution retrieve the status of the last website check from the Amazon DynamoDB's table and generate a static HTML page.

Advanced (Multi Region) monitoring website

Actually this is the static HTML website which presentation layer generates.

AWS Multi Region Website Monitoring App - screenshot

Frequently Asked Questions (FAQ)

1) Why in your architecture did you decide to use Amazon SNS instead of AWS Step Functions for orchestration of AWS Lambda functions??

Because AWS Step Functions, unfortunately, does not support cross-region Lambda orchestration and Amazon SNS does.

2) Why is it needed to setup DynamoDB database table replication for all of the AWS regions that are used?

It is because standard Amazon DynamoDB table is reachable only from the AWS region in which it was created. Database table replication can make this table available for both READ and WRITE operations from multiple AWS regions.

3) Why is the response time mentioned below the individual region that high?

This is due to the fact that we are not doing traditional network ping operation in order to gather info about the round trip delay to the target node, but instead we are connecting to the target website via https protocol and fetching its content. So, overall, the following is incorporated into that response time:

  • time needed for DNS resolution of the target website's hostname,
  • time for HTTPS communication handshake,
  • time to download website's page.

4) How long did it take you to design and implement this 3 layered architecture?

It took me 1 weekend to design it and implement it, another weekend to fine-tune it and write article about. This is the beauty of serverless setup and deployments, they massively save time.

5) If I wanted to improve your solution to be even more advanced and helpful what would you suggest implementing as its next features?

There are many ideas to improve the current solution, these 3 are what I would recommend:

  • add support so that multiple websites can be checked at once (not just one),
  • add analytical charts view so that website downtime can be viewed across time as it occurred (prerequisite is implementation of historical data tracking),
  • implement logic to handle short term and long term downtimes differently (e.g. if the website is down for more than 10 minutes, send e.g. also sms, if more than 30 minutes, phone call user etc).

6) What are the monthly costs this monitoring solution may inflict?

Even though the overall architecture consisting of 3 layers looks quite complex and may seem that it may generate some substantial costs, suprisingly the costs of this solution are almost zero.

This is because of these 3 reasons:

  1. AWS within its AWS Free Tier program provides all AWS users certain volume of its services for free, for instance: 1 million of Lambda function executions / month are not billed. There is no need to specially enroll into this program, it is provided automatically for anyone who has registered its own AWS account.
  2. Our entire monitoring solution is implemented using 100% serverless webservices which means that we are not deploying any EC2 instances that are usually associated with higher costs.
  3. We disabled CloudWatch Alarms for Amazon DynamoDB table activity.

Now, let's take a look at more details about the potential costs.

Business layer:?

  • 8928x Amazon EventBridge runs,
  • 8928x Amazon SNS runs,
  • 44640x AWS Lambda runs,

Total costs: 0 USD/month.

Notification layer:

  • 8928x Amazon EventBridge runs,
  • 8928x AWS Lambda runs,
  • 0-8928 Amazon SNS runs (depending on the actual downtime duration).

Total costs: between 0 and 0.16 USD/month (depending on the actual downtime duration)

Presentation layer:

  • depends on the traffic on the Website Monitoring webpage (how often you or other users access the presentation layer via their browsers),
  • you need to have thousands of users visiting it daily to inflict any costs and those would be even ridiculously small,
  • generally, the only costs are those that you pay for Route S3 domain services: $0.50/monthly for each hosted zone (domain name) + yearly costs of domain name registration if you handle it via Route 53 depending on the chosen domain name extension.

7) Can you provide full instructions including the source code how to setup and configure this entire Multi Region website monitoring solution on AWS?

Yes, feel free to download the instructions for setting up all 3 layers with all of the steps described.

8) Can I contact you in case I have questions or get stuck with the solution implementation?

Yes, don't hesitate and feel free to contact me. You can reach me via my LinkedIn profile - I accept all requests to connect and will be glad to discuss this or other AWS architectures with you.

If you find this article insightful, please share it. Thank you!

A neat solution for a global Service Availability monitoring and alerting, well done!

要查看或添加评论,请登录

Rastislav Skultety, MBA的更多文章

社区洞察

其他会员也浏览了