登录查看更多内容

AWS - optimizing Lambda usage trough DynamoDB with CloudWatch Rules

Ivan Vokhmin

Lead Engineer Frontend @ moebel.de Einrichten & Wohnen GmbH | AWS, Team Leadership, Software Architecture, AI

发布日期: 2023年8月22日

In moebel.de we use AWS lambda for many projects. From event processing to page rendering, lambda leverages a lot of shifting loads. However, here are some cases where triggering lambda blindly with high concurrency will cause congestion issues and high costs, and a solution I was able to architect to mitigate these issues.

Cases

We encountered 2 different cases when too many concurrent invocation caused issues.

Case 1: CDN purging - connecting to rate-limited APIs

Our page heavily depends on CloudFlare to cache many pre-generated pages and resources. Cached time may be long for optimization purposes, so we need to actively tell CloudFlare to refresh cache for specific URLs. For this, we use CloudFlare API.

Our usage of CloudFlare purge APIs implies that we sent many purge request for different URLs from many event producers (trough SQS queue or direct lambda invocation). When too many purge events were generated, we could have hit request per minute limit of CloudFlare API. Because of this, some purge requests failed and congestion was created (hundreds/thousands of requests that failed after some time because of processing timeout). Worst of all, many of those requests were redundant (same URL purged multiple times).

Case 2: Data transformation - excessive (avoidable) invocation costs

Updating data model of our portal requires constant synchronization of slow backend database and a fast cache layer via data transformer service (lambda). While backend team that operates the database can do hundreds or thousands of changes, every change create an SQS queue event. Even with some event bundling, we still have a lot of redundant events (like dozens of props of same category changed => transformer must update cache layer multiple times, in parallel). Every time a SQS event arrives, lambda must read remote database, do a lot of computational work on object id from event, and store processed data in fast cache layer. This has lead to significant cost increase with changes becoming more and more frequent. Another variable here is that the computation load and memory requirements of transforming lambda are pretty big (and therefore every invocation is expensive).

What do this cases have in common?

In both of those situations, there are some common traits that warrant a template solution:

Redundant invocations (invocations with same event payload) that can be processed in debounced way
Not very time-sensitive data (most of stakeholders are prepared to wait for some minutes/hours before the changes are processed)
Sporadic event generation - usually events come in big batches, but most of the time lambda is idle
We can not afford to miss events. Parts of our website will become obsolete because of this

Template solution

Single lambda logic was split into 2 lambdas. One of them (very small one) accepts all possible events and invocations and writes them in DynamoDB in a de-duplicated way (consumer lambda). Another one (worker lambda), more heavy on logic, memory and CPU is executed periodically on schedule by CloudWatch Rule (like every 5 or 30 minutes etc).

No alt text provided for this image — Solution overview

To ensure de-duplication, a unique property should become a HASH key (partition key) - so no duplicate events can be inserted for later processing. Here is a CloudFormation example that we use to store unique URLs for later CloudFlare API purging.

  DynamoDBForUrls
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: url
          AttributeType: S
      # PAY_PER_REQUEST is recommended for fluctuating workloads
      BillingMode: PAY_PER_REQUEST
      KeySchema:
        # Forces unique urls
        - AttributeName: url
          KeyType: HASH
      TableName: url-storage-table-name

The workflow of worker lambda starts by reading many entries at once from DynamoDB, do work on them (like contact related database for exact category data and transform it, or connect to CloudFlare API to purge batch of URLs), then delete processed entries from DynamoDB. To prevent possible "update while processing", some critical events have modification date appended apart from partition key (DynamoDB is schemaless, so extra fields can be added at will). Deletion only happens if modification date match. If a worker lambda encounters empty database on its run, it exits immediately to keep costs low.

领英推荐

All you need to know about Databricks Unity Catalog

Nexer Group 9 个月前

SELECT News From Yugabyte - Feb 23

Yugabyte 2 年前

Learn how Scrapy Cloud can turn a data deluge into…

Zyte 1 年前

Important note: while consumer lambda can scale and run in concurrent mode, this is fine as it only writes data to DynamoDB and exits instantly. The heavier worker lambda is executed without parallelization (the more delays between invocations => lower the costs because of de-duplication and less dry runs, but also longer overall event processing time)

DynamoDB perfectly matches our use cases, because:

It is very easy to setup and use (+ scale as you go)
We don't need to keep data for long time - most of the time it is empty. With PAY_PER_REQUEST model we keep db costs very low (few USD in a month).

Note: you may consider to reserve DynamoDB capacity by using PROVISIONED billing mode for constant workloads.

Outcomes

After solution was implemented, some time was spent to analyze the outcomes.

Rate-limited API case

Our event processing time increased (sometimes up-to an hour in very high demand case). However, we never run into rate-limit of CloudFlare API or lost an event (URL to purge) again.

Cost decrease for transformer case

After the solution was rolled out (red arrow on next 2 graphs) we could process significantly more SQS events at greatly reduced price, because all events were properly bundled together (and we saved a lot of common execution time). This happened because most of events require a lot of common information from the backend database, and reading it was taking significant time from lambda execution. Now, with "one run for all" events, we spared a lot of avoidable lambda costs.

Conclusion

While lambda is very good at handling spontaneous loads, in some cases it is beneficial to put a "middleman" like DynamoDB to split SQS event consumption away from real processing logic. It can reduce costs or adhere to external rate-limits at the cost of increased event processing time.

#aws #dynamodb #lambda #sqs #cloudwatch #cloudflare

Tim Endert

?????????? TypeScript & Flutter | Software Developer

1 年

????

要查看或添加评论，请登录

Ivan Vokhmin的更多文章

(GitLab) CI Pipeline Tricks: Automating Aurora Serverless v2 Cluster Restorations

2025年3月14日

(GitLab) CI Pipeline Tricks: Automating Aurora Serverless v2 Cluster Restorations

Introduction During my work on a CMS project using AWS Aurora Serverless v2, I faced numerous challenges related to…
Datadog vs self-hosted grafana/loki for observability - migration case

2024年11月25日

Datadog vs self-hosted grafana/loki for observability - migration case

Observability matters. Choosing right platform to retain and visualize logs and metrics is important for incident…
Executing scheduled serverless tasks with AWS ECS fargate or lambda

2024年9月11日

Executing scheduled serverless tasks with AWS ECS fargate or lambda

Recurring tasks require compute powers to be provisioned at predefined schedule. Like "process sales report at the end…
Technical challenges of AB test user segregation

2024年8月23日

Technical challenges of AB test user segregation

Every website that has a feature developed and ready for production wants to know if this feature is making a good…
The hidden costs of web scraping

2024年7月26日

The hidden costs of web scraping

During my long developer career I encountered multiple cases when companies were taking data directly from websites…
Gravity of monoliths in feature-centered frontend projects

2024年7月5日

Gravity of monoliths in feature-centered frontend projects

During more than 10 years I had some pleasure of working with different projects with various codebases and code…
How to deal with third party API integration issues for web services?

2024年4月10日

How to deal with third party API integration issues for web services?

How to deal with third party API integration issues for web services? Many web services offer beautiful APIs that solve…
AWS Lambda: Accessing private VPC resources and internet without NAT gateway

2024年2月18日

AWS Lambda: Accessing private VPC resources and internet without NAT gateway

There is a commonly known design decision of AWS to launch lambda in a separate VPC that belongs to AWS itself. This…
Monitoring 3rd party API response time to proactively improve performance (moebel.de case)

2023年12月1日

Monitoring 3rd party API response time to proactively improve performance (moebel.de case)

Usually web apps (websites with SPA) are monitored for performance as “black boxes” - their response time (Time To…
Using assembly in node.js

2023年7月25日

Using assembly in node.js

This is my successful timeboxed attempt to integrate assembly code in a node.js project for fun.

See all articles

AWS - optimizing Lambda usage trough DynamoDB with CloudWatch Rules

Ivan Vokhmin

Lead Engineer Frontend @ moebel.de Einrichten & Wohnen GmbH | AWS, Team Leadership, Software Architecture, AI

Cases

Case 1: CDN purging - connecting to rate-limited APIs

Case 2: Data transformation - excessive (avoidable) invocation costs

What do this cases have in common?

Template solution

领英推荐

Outcomes

Rate-limited API case

Cost decrease for transformer case

Conclusion

Ivan Vokhmin的更多文章

社区洞察

其他会员也浏览了

2025 - Week 6 (3 Feb - 9 Feb)

Amazon Athena

Shortcuts and Mirroring at Microsoft Fabric

Adapting to Change with Data Patterns on AWS

Redis Released - The Future of Technology is Here

Day 1: A Balanced Assessment of Databricks

Understanding AWS S3 Directory Buckets

Rivalz AI Shatters Records: 500,000 Concurrent Connections and 1.6 Million Requests Per Second!

Topics – The Redpanda Newsletter (Issue #023)

RisingWave Newsletter September 2023

Cases

Case 1: CDN purging - connecting to rate-limited APIs

Case 2: Data transformation - excessive (avoidable) invocation costs

What do this cases have in common?

Template solution

领英推荐

Outcomes

Rate-limited API case

Cost decrease for transformer case

Conclusion

Ivan Vokhmin的更多文章

(GitLab) CI Pipeline Tricks: Automating Aurora Serverless v2 Cluster Restorations

Datadog vs self-hosted grafana/loki for observability - migration case

Executing scheduled serverless tasks with AWS ECS fargate or lambda

Technical challenges of AB test user segregation

The hidden costs of web scraping

Gravity of monoliths in feature-centered frontend projects

How to deal with third party API integration issues for web services?

AWS Lambda: Accessing private VPC resources and internet without NAT gateway

Monitoring 3rd party API response time to proactively improve performance (moebel.de case)

Using assembly in node.js

社区洞察

其他会员也浏览了

2025 - Week 6 (3 Feb - 9 Feb)

Amazon Athena

Shortcuts and Mirroring at Microsoft Fabric

Adapting to Change with Data Patterns on AWS

Redis Released - The Future of Technology is Here

Day 1: A Balanced Assessment of Databricks

Understanding AWS S3 Directory Buckets

Rivalz AI Shatters Records: 500,000 Concurrent Connections and 1.6 Million Requests Per Second!

Topics – The Redpanda Newsletter (Issue #023)

RisingWave Newsletter September 2023