登录查看更多内容

Debug ECS Fargate Memory Leak

Rohit Patel

Software Developer 3 at Amazon

发布日期: 2022年9月16日

Recently, I created new service using ECS fargate. Ever since we started dailing up the traffic, we started seeing some memory leaks in the system. Memory utilisation keep on increasing continuously resulting in service being crashed. [We had step-scaling enabled which helped us replace the ECS tasks after xx threshold, thus preventing the actual crash]

In this article, I will take you through different steps which are required to analyse the potential memory leaks in your system. On a high level, we will cover following things in order-

Login into the ECS Container
Take Heap Dump
Analyse the Heap Dump

1) Login to ECS Fargate

In order to debug the memory issue, one has to login into the Fargate container. Simplest way to login into ECS container is by using Amazon ECS Exec (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html).

Prerequisites for using ECS Exec

Install and configure the AWS CLI
Install Session Manager plugin for the AWS CLI

Enabling and using ECS Exec

a) Setup IAM permissions required for ECS Exec - Create policy for heap dump.

{
   "Version": "2012-10-17",
   "Statement": [
       {
       "Effect": "Allow",
       "Action": [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
       ],
      "Resource": "*"
      }
   ]
}

b) Create an admin role and attach above policy to the admin role. Also, attach ECSTaskInstanceRole to the role.

c) Use script -> https://github.com/aws-containers/amazon-ecs-exec-checker to debug issue setting up the permissions.

d) Enable execute command on your ECS Service

aws ecs update-service --cluster <YOUR CLUSTER> --enable-execute-command --service <YOUR SERVICE> --region <YOUR REGION e.g. us-east-1>

e) ECS Exec can not be added to tasks that are already running, so you will not be able to run commands on this task. So you will need to force a deployment.

aws ecs update-service --force-new-deployment --cluster <YOUR CLUSTER> --service <YOUR SERVICE> --region <YOUR REGION e.g. us-east-1>

f) Finally, SSH into container using new Task

领英推荐

How to Use and Configure AWS CLI Environment Variables

Neal K. Davis 1 年前

Calculating S3 folder storage - The tricky process

Shrey Batra 1 年前

AKS Ingress - Day22

Vijayabalan Balakrishnan 2 年前

aws ecs execute-command --cluster <YOUR CLUSTER> --task <TASK ID> --container web --interactive --command "/bin/sh" --region <YOUR REGION e.g. us-east-1>

2) Take Heap Dump

As we have login into the fargate container, next step is to take the heap dump.

a) Run ps command to get the processId, usually processId is 1.

ps -ef | grep java

b) Install JDK

sudo yum install java-1.7.0-openjdk-devel

c) Take the heap dump using jmap command??

jmap -dump:live,format=b,file=Task1.hprof 1

Now we have the heap dump available on the ECS task, in order to analyse it we need to download the dump to local, we can either use scp command or copy the dump to S3 and later download it to local for analysis. In oder to copy the heap dump to s3, attach AmazonS3Access policy to the ECS taskInstance role so that container have access to S3.

3) Analyse the Heap Dump

Once we have heap dump available in local, last step is to analyse it using different memory tools available.

We can use Eclipse MAT tool (https://www.eclipse.org/mat/) to load the Dump and analyse the highest object reference and back track the root cause?

Other alternatives are VisualVM (https://visualvm.github.io/) and JProfiler (https://www.ej-technologies.com/products/jprofiler/overview.html)

Lastly, one can setup automated ways as well which downloads and copy the heap dump to S3 as per API requests which enable developers to debug the memory issue taking incremental snapshots of the memory.

Hope this helps!

Thanks,

Rohit

Debug ECS Fargate Memory Leak

Rohit Patel

Software Developer 3 at Amazon

1) Login to ECS Fargate

领英推荐

2) Take Heap Dump

3) Analyse the Heap Dump

社区洞察

其他会员也浏览了

March Newsletter: User Guide to AWS Amplify

Terraform Compiled Rough Notes (that got me certified)

AWS Lambda Features, Use Cases and Best Practices

Extending Steadybit - AWS Lambda

Parameters

What is AWS Lambda

-----EKS Task-----

Preparing for your AWS Solutions Architect - Associate Exam

?? Setting up a 3-Node Apache NiFi Cluster: Unleash the Power of Data Flow! ??

AWS IAM Access keys rotation using Lambda function