登录查看更多内容

AWS Lambda & RDS in VPC: The Best Practice

Huaifeng Qin

Staff System Engineer

发布日期: 2023年3月24日

Introduction

Serverless is an excellent concept of cloud computing. It simplifies cloud application development by allowing the developers to focus on delivering business logic without worrying about computing capacity planning and ongoing maintenance.

AWS Lambda is one of the essential services when building a serverless application on AWS. While most of the serverless application tutorials use managed NoSQL database services such as DynamoDB, it is still a common scenario for Lambda to connect to an RDS (Relational Database Service) database instance.

To start the discussion, I will use Lambda functions to develop serverless APIs that read CSV from an S3 bucket and write the data to the MySQL database table.

The most straightforward way for a Lambda function to access an RDS would be to allow RDS to be publicly accessible. When the RDS is publicly accessible, Lambda can directly connect to the RDS. By default, AWS deploys Lambda into an AWS-managed VPC, which will have internet access.

I start by designing the architecture as below.

For testing or non-product applications, this model is okay. But there are security risks in allowing RDS to be publicly accessible. Imagine the RDS credentials are accidentally hardcoded in the code and published to the public git repository. Anyone with RDS credentials can connect to the RDS over the internet.?

Therefore, placing the RDS instance in public subnets and enabling public access is generally not considered a best practice.?

Best Practice of Deploying RDS

Data is undoubtedly one of the most critical assets of modern business, and security is among the top priority when considering enterprise solutions. Many whitepapers and knowledgebase articles guide the best practice of RDS deployments, such as Trend's AWS RDS Best Practices, Security best practices for Amazon RDS and Security in Amazon RDS.

One of the standard best practice suggestions is to deploy an RDS instance into isolated subnets within a VPC.

Other best practices to consider include the following:

Use IAM (Identity and Access Management) to authorise RDS access,
Whitelist IP address,
Use Security Manager to store the credentials and rotate them.

Following the RDS deployment best practice, I revised the architecture drawing. I deployed the RDS instance in the isolated subnets within a VPC. I use Secret Manager to store the database credentials in this revision.

If I deploy this revision and try to access the API endpoint, I should encounter a Lambda function timeout error.

The root cause of the timeout error is Lambda function failed to connect to the RDS in the VPC. The RDS instance is now placed inside an isolated subnet. The Lambda function outside the VPC will not be able to connect to the RDS instance directly. You can compare this by visiting an on-premises server in the enterprise network (LAN), which sits behind the firewall. Without proper routing and forward configuration, the traffic from the internet will not be able to reach the server.

Placing the RDS instance inside the isolated subnet is a good security measurement but will break the connection between the RDS and the other software components. Under such circumstances, what would be the best practice for the Lambda function to access the RDS?

Lambda within a VPC

Will placing the Lambda in the VPC allow it to connect to the RDS? The quick answer is yes.

AWS allows assigning a VPC (tenancy VPC) to the Lambda function. However, while it is good practice to put the RDS in the isolated subnet in VPC, putting Lambda functions in the VPC is somewhat controversial. Some knowledgebase articles like this suggest that it's best practice NOT to put the Lambda function in a VPC unless the function must access other resources in the VPC. A lambda function is a short-lived computing instance. It will maximise its power when used together with other AWS services. Confining the Lambda function within a network boundary like a traditional virtual computer will bring little benefit. We will discuss this later.

To allow the Lambda function to connect to the RDS instance in the VPC, I modified the architecture drawing to move the Lambda function inside the same VPC where the RDS instance is deployed. However, redeploying this revision will not fix the API access error.

By default, Lambda functions are deployed with 'no-VPC' attached. AWS manages the 'no-VPC' Lambda function's network with internet access. However, the Lambda functions placed in the tenancy VPC can only have private IP addresses, and deploying the Lambda functions in the public subnets is impossible. Therefore, Lambda in a VPC can't use an internet gateway to access the internet.

As a result, while assigning the same VPC as the RDS to Lambda will allow the Lambda functions to connect to the RDS instance, it also isolates the Lambda function from the internet. AWS-managed services can only be reached via the internet. In my solution, S3 and Secret Manager will become unreachable from the Lambda function after assigning the VPC to the Lambda function.

Although the internet connection from Lambda in the tenancy VPC can't be built automatically, two approaches can help bridge the Lambda function with other AWS services or even access the internet.

We can deploy a NAT (Network Address Translate) gateway if full internet access is required, but if the Lambda functions only need to access specific AWS-managed services, we can use VPC Endpoints.

Connect Lambda to the Internet via NAT

Using NAT devices to route internet traffic is common in enterprise network topology. NAT is a ‘classic’ traffic forward technique. It requires a public IP address to be attached to the NAT gateway. The local subnets’ private IP addresses are mapped to the public IP address to allow computers in the local subnet to access the internet.

I revised the architecture by using NAT to connect the Lambda function to the internet.

This deployment should fix the S3 and Secret Manager accessing issues. In my application, the Lambda function only needs to access S3 and Secret Manager services. In this scenario, using NAT is not the most cost-effective solution.?

AWS NAT Gateway is pricing based on the gateway usage hours plus the data the gateway processes. Let’s take the Sydney region as an example with 1 NAT gateway deployed and 1TB data transferred per month:

730 hours in a month x 0.059 USD = 43.07 USD (Gateway usage hourly cost)

1,000 GB per month x 0.059 USD = 59.00 USD (NAT Gateway data processing cost)

43.07 USD + 59.00 USD = 102.07 USD (NAT Gateway processing and month hours)

1 NAT Gateways x 102.07 USD = 102.07 USD (Total NAT Gateway usage and data processing cost)

Total NAT Gateway usage and data processing cost (monthly): 102.07 USD

Note that NAT Gateway measures all the data it processes. Suppose my APIs become an essential component of a busy ETL pipeline. It will read TBs data from the S3 bucket monthly. And it will incur significant data processing charges!

In my application, I only use specific AWS-managed services. To avoid the S3 data travelling via the NAT Gateway, I should use VPC Endpoints to access AWS-managed services.

领英推荐

Introducing Veeam Backup for AWS v5

Veeam Software 2 年前

Oracle on AWS: With Tessell, performance is no…

Steven Kaplan 2 年前

Leverage the full potential of your AWS cloud data…

Zoho Analytics 11 个月前

Connect Lambda to the AWS Services via VPC Endpoints

A VPC endpoint enables a private connection between AWS services and the VPC. It allows instances within a VPC to connect to AWS services without traversing the internet.?

Accessing AWS-managed services via VPC Endpoints is a highly recommended approach by many technical blogs. Of course, if the Lambda function must access the internet, it will still require a NAT Gateway.

There are two types of VPC Endpoints: gateway endpoints and interface endpoints. Gateway endpoints are only used for S3 and DynamoDB services. There is no additional charge for using the gateway endpoint.

For all the other available AWS-managed services, AWS interface endpoints are used to link instances/functions inside VPC to them. The interface endpoint is billed based on usage hours and how much data the endpoints are processed. But compared to a NAT gateway, deploying one VPC endpoint is significantly cheaper.

Using the Sydney region again as an example: for 1 VPC interface endpoint deployed with 1TB of data transferred per month, the cost breakdown is as follows:

1 VPC endpoints x 1 ENIs per VPC endpoint x 730 hours in a month x 0.013 USD = 9.49 USD (Monthly cost for endpoint ENI)

Monthly cost for Interface endpoints: 9.49 USD

Tiered price for: 1000 GB

1000 GB x 0.0100000000 USD = 10.00 USD

Total tier cost = 10.0000 USD (PrivateLink data processing cost)

Total data processing cost: 10 USD

9.49 USD + 10 USD = 19.49 USD (Total PrivateLink Cost)

Total PrivateLink endpoints and data processing cost (monthly): 19.49 USD

I revised the architecture drawing to switch from NAT to VPC Endpoint. The new deployment will generate one gateway endpoint for S3 and one interface endpoint for Secret Manager. Redeploying the serverless application should see the API works properly.

To use specific AWS-managed services within the VPC, one interface endpoint per service per VPC needs to be created. The interface endpoints must be kept alive during the application's lifetime otherwise the application will encounter unexpected errors.

When I review my solution cost, the ten bucks monthly charge for a single secret stored in the Secret Manager doesn’t sound like a good justice for using the interface endpoint. Assuming that each serverless application creates its own VPC and interface endpoints, it would be a big waste of investment. To improve the interface endpoints usage efficiency, it is good practice to group the applications into one VPC to reuse the interface endpoints.

It will start getting more annoying if the Lambda Functions need to access different AWS-managed services and I will have to create separate interface endpoints for each of the services my application relies on. When the application requires ten or more interface endpoints, the cost will become comparable to the NAT deployment.

10 VPC endpoints x 1 ENIs per VPC endpoint x 730 hours in a month x 0.013 USD = 94.90 USD (Monthly cost for endpoint ENI)

Monthly cost for Interface endpoints: 94.90 USD

Tiered price for: 1000 GB

1000 GB x 0.0100000000 USD = 10.00 USD

Total tier cost = 10.0000 USD (PrivateLink data processing cost)

Total data processing cost: 10 USD

94.90 USD + 10 USD = 104.90 USD (Total PrivateLink Cost)

Total PrivateLink endpoints and data processing cost (monthly): 104.90 USD

Therefore, it might be worth exploring the options that can minimise the use of AWS-managed service. In my application, I avoided using Secret Manager by enabling RDS IAM authentication.

But hang on, although it is sensible to be aware of the solution cost, instead of leveraging the usage of AWS-managed services I’m pushing the solution away from using more AWS-managed services. It doesn’t smell good!

Rethinking

Looking back on the journey, I started by looking for a solution to allow the Lambda functions to connect to the RDS in the VPC. I placed the Lambda in the same VPC as RDS to enable connectivity between the Lambda function and RDS. The solution gradually evolved to allow the rest parts of the application to work properly under the new deployment environment.

A Lambda function is a short-lived virtual computer instance, and unlike a classic cloud virtual computer, it doesn’t have full access to the underlying computing resources. Therefore, to maximise the power of the Lambda function, it should be used with other AWS services.

AWS offers more than 200+ services. Normally, cloud user – including a Lambda function – accesses AWS-managed service via the internet. By moving a Lambda function into the tenancy, VPC is isolated from the rest of the AWS services. To build the connection from a Lambda function to an AWS-managed service, I must ‘plug’ a private link for each required service.

Now, let me recap the notes in this AWS knowledgebase article:

"It's a best practice to not put your Lambda function in an Amazon VPC unless the function must access other resources in the VPC."

This best practice rule makes much more sense now!

As for my situation, it is a valid case that my Lambda function must access the RDS in the VPC. Removing the VPC assignment from Lambda deployment will bring me back to the original point where the database connection is broken.

Let's change the angle of viewing the problem. Instead of isolating Lambda from most AWS services, why don't I segment the lambda functions that need to be in a VPC from the rest of the functions?

Mixed Lambda Deployment

I split the Lambda function into two Lambda functions, with the new Lambda function containing the code to access RDS in the VPC. The idea is to differentiate the Lambda functions that need to access RDS from other functions and only place those functions (that need to access RDS) in the VPC. For illustration purposes, I keep the code to read CSV from S3 in the original Lambda function.

The solution was updated as in the diagram. I add an SNS (Simple Notification Service) resource to allow the CSV reading function to invoke the RDS writing function. The CSV reading function is deployed with no tenancy VPC attachment. The RDS writing function is placed inside the same VPC as the RDS.?RDS IAM authentication is used to replace username password access. Therefore there is no need to visit Secret Manager.

The revised solution target to place only small parts of Lambda functions in the VPC while ensuring the rest of the Lambda functions can be deployed with no tenancy VPC attachment. It is a balanced solution to achieve what I plan to do.

To summarise my approach, I want to extend the quote above:

It's a best practice to not put your Lambda function in an Amazon VPC unless the function must access other resources in the VPC. If you have to do so, try to minimise the scope of the Lambda function that has to be put in the VPC.

Conclusion

AWS Lambda is a versatile tool for building low-cost, powerful serverless applications. When dealing with the interoperability issue between Lambda and RDS, following the best practice rules is recommended if you can.

As a quick checklist, the best practice rules that can be applied to RDS and Lambda when dealing with VPC include:

RDS instance should be deployed in the isolated subnets within a VPC,
Use RDS IAM authentication if you can,
Don’t put your Lambda function in an Amazon VPC unless the function must access other resources in the VPC,
Separate the Lambda functions with RDS access needs from the functions without RDS access needs,
Only assign VPC to the Lambda functions with the RDS access needs
Avoid using NAT unless you have to,
Deploy VPC endpoints for Lambda functions (inside a VPC) that need to access other AWS services,
Monitor the number of VPC interface endpoints deployed and the data transferred, noting inflection points where NAT may be more beneficial.

Greg Norman

Managing Director at Forecast | GAICD | MBA

1 年

Great article Huaifeng Qin

要查看或添加评论，请登录

Huaifeng Qin的更多文章

Azure Function (Python) Deployment with Terraform

2023年11月3日

Azure Function (Python) Deployment with Terraform

Introduction The cloud services landscape is dynamic, and Infrastructure as Code (IaC) tools like Terraform have become…

2 条评论
Why did I stop using PowerFlow?

2023年10月8日

Why did I stop using PowerFlow?

Two weeks ago, I found myself enamoured with the vast capabilities of the Power platform. Admittedly, my familiarity…
The four elements of data readiness - is your organisation DATA READY?

2022年12月14日

The four elements of data readiness - is your organisation DATA READY?

Twenty years ago, when I was still in university, the buzzword was Pervasive Computing. It illustrated a vision that…

AWS Lambda & RDS in VPC: The Best Practice

Huaifeng Qin

Staff System Engineer

Introduction

Best Practice of Deploying RDS

Lambda within a VPC

Connect Lambda to the Internet via NAT

领英推荐

Connect Lambda to the AWS Services via VPC Endpoints

Rethinking

Mixed Lambda Deployment

Conclusion

Huaifeng Qin的更多文章

社区洞察

其他会员也浏览了

WHAT IS GCP

Deploy Django Application on EC2 with PostgreSQL, S3, Domain, and SSL Setup

Building AI-Powered Apps with Structured Data Using Azure, GCP, or AWS

Navigating Cloud SLAs for PaaS: Analyzing (not only) Database and AI Services

Let's explore serverless computing.

Tessell @ AWS re:Invent - Know Before You Go

Google Cloud Platform

Live Migration with no downtime to Cloud Memorystore Redis

Ch-2. GCP Tech-Stack - Storage services

Service Comparison Between Google Cloud Platform (GCP) and Amazon Web Services (AWS)

Introduction

Best Practice of Deploying RDS

Lambda within a VPC

Connect Lambda to the Internet via NAT

领英推荐

Connect Lambda to the AWS Services via VPC Endpoints

Rethinking

Mixed Lambda Deployment

Conclusion

Huaifeng Qin的更多文章

Azure Function (Python) Deployment with Terraform

Why did I stop using PowerFlow?

The four elements of data readiness - is your organisation DATA READY?

社区洞察

其他会员也浏览了

WHAT IS GCP

Deploy Django Application on EC2 with PostgreSQL, S3, Domain, and SSL Setup

Building AI-Powered Apps with Structured Data Using Azure, GCP, or AWS

Navigating Cloud SLAs for PaaS: Analyzing (not only) Database and AI Services

Let's explore serverless computing.

Tessell @ AWS re:Invent - Know Before You Go

Google Cloud Platform

Live Migration with no downtime to Cloud Memorystore Redis

Ch-2. GCP Tech-Stack - Storage services

Service Comparison Between Google Cloud Platform (GCP) and Amazon Web Services (AWS)