Building resilient Server-less systems using AWS
Vinayak Raghuvamshi
Driving Engineering Excellence | Cybersecurity, Distributed Systems | AWS | Azure | AI | Author | Mentor | Spirituality Coach
This article is inspired by and based upon a talk given by David Yanacek, Sr. Principal Engineer at AWS.
In one of the CS course works at Carnegie Mellon University, they define resiliency as:
A resilient system protects its critical capabilities (and associated assets) from harm by using protective resilience techniques to passively resist adverse events and conditions or actively detect these adversities, respond to them, and recover from the harm they cause.
When building services and systems using serverless architecture, we get resiliency pretty much out of the box. It would be helpful in understanding the various aspects of resiliency, the challenges and recommendations for addressing the same.
Overload:
A system is termed to be under overload when it is handling too many transactions (ineffectively) and slowing everybody.
It is almost impossible to build a linearly scalable system where the throughput keeps going up along with load. As per Amdahl's universal scalability law, you could parallelize a system up to the point where contention becomes the bottleneck. Beyond this point, you will start seeing diminishing returns. Here you can watch a great session on Applying The Universal Scalability Law to Distributed Systems by Dr. Neil J. Gunther himself.
Here is a sample graph of expected throughput vs latency
When our fastest response time exceeds the client timeout, we say the system has a brown out. In this example, the system's processing of payloads that have latency more than the client timeout are pretty much useless. System is doing a lot of work, utilization could be at 100% however it is not getting much done per unit of time and the clients are timing out before it can respond.
How to prevent this?
First we need to get an idea of how much load we can handle optimally. Load tests play a crucial role in helping us measure this. We basically need to find the tipping point. Needless to say, such load tests should be performed in in UAT / load test environment and not on production ??
Once we are aware what the optimum load is, we can do a few things to avoid overload.
Load shedding.
Which just means we reject extra work once we reach the optimum load / tipping point. Load shedding helps ensure that we don't land up impacting everybody because of overload, at the cost of rejecting the transactions made by a few.
We should design our systems to not waste work. Handling client timeouts is one key aspect to consider.
This can also have a cascading, cumulative effect when our server has other dependencies.
One of the ways to mitigate this issue is by setting server timeouts (on the serverless lambda ?? ) . AWS Lambda lets you configure timeouts anywhere between 3 seconds to 900 seconds. The caveat is that we may timeout on legitimate but expensive requests and also in scenarios where our dependencies have latency spikes, penalizing the client for a server side issue. So, when setting these timeouts it would be a good practice to keep the value close to the client timeout. If different types of clients have vastly different timeouts then we may want to consider providing them with different API endpoints.
The other alternative is to do 'bounded work'. Here we do input size validation, pagination and checkpointing. Bounded work basically means not taking on too much work (or not taking on more work than we can efficiently handle within the established SLAs).
Checkpointing is the ability to do incremental work and saving the state incrementally. A good example is a DynamoDB scan. When we scan a large DB, we do not get back the entire content in one go. Instead we get chunks at a time. If we have any intermittent failures, we can pick up from where we last left, instead of starting all over again.
In lambda execution environments (containers, micro vms, etc) we have fixed resources per unit of work. Which means we have the same amount of resources for every request. So every execution environment is working on only one task at a time. This is also called isolation of workload. This gives a predictable performance. Because there is no contention of resources between requests, the latency remains consistent across requests. This is one good way to enforce the tenet "Do not take on too much work". So the feature of 'bounded work' is also available pretty much out of the box with AWS serverless architecture.
To summarize, load shedding involves:
- Rejecting excess work, or load
- Reducing wasted work (server timeouts)
- Doing bounded work (input size validation, pagination, checkpointing)
- Not taking on too much work. Reserving same amount of resources for each request.
The other method for building resilient systems is queueing.
When there is a surge in traffic, the queue gets bigger. The problem is that when traffic spike goes away, we are still left with backlogged items in the queue. The bigger the queue, the farther we get from real time processing of jobs and this can be bad for time sensitive applications. An example is a calendar invite for an important, urgent event that is going to happen in 15 minutes, however because the queue had a large backlog, it did not get sent to the invitees until after the event was supposed to start.
One of the ways of mitigating this issue is by using priority queues. Of course, this works only when we have a good distribution of priorities across jobs. If all the jobs had the same priority then this behaves like the normal queue.
The other mechanism used to prevent the queue from getting larger beyond acceptable limits is to use backpressure or throttling. You can use API GW to configure throttling limits. You can also combine the use of priority queues with throttling to get the best of both worlds.
AWS lambdas allow for asynchronous execution and uses SQS to queue the jobs.
For asynchronous invocation, Lambda places the event in a queue and returns a success response without additional information. A separate process reads events from the queue and sends them to your function. You can check out the asynchronous invocation configuration API for more details.
As an example of queued execution, it is common for users to set up the execution of a lambda function when an object is pushed to an S3 bucket.
The other mechanism used to build resiliency into services is by using Shuffle sharding. It is a big topic in itself. AWS Sr. Principal Engineer Colm MacCarthaigh has published a nice article explaining shuffle sharding and I would highly recommend you to read it here.
Finally, we need the right tools for operating and monitoring our services to ensure better resiliency. Here are a few key tools that we should be familiar with.
AWS Cloudwatch contributor insights.
Hope this article helped provide an overview into how resilient systems are built using AWS technologies. I have tried to keep it high level. If you have any specific queries feel free to DM me.
Cheers!
Sr. Project Manager, Oracle NetSuite
4 年Vinayak - thanks for writing this article. I have a question: would this article be helpful in cracking the amazon TPM interview? I ask because the theme here is "server less" - would this approach work for the interview?
Well written article Vinayak.
Technical Program Management| AI| e-Commerce| ex-Amazon
4 年Thanks for sharing, Vinayak!