登录查看更多内容

Saving EC2 logs on S3 buckets via lifecycle hooks

Shishir Khandelwal

Platform Engineering

发布日期: 2021年12月25日

An EC2 may get terminated for a variety of reasons. And since it usually runs something crucial for the applications such as frontend, backend, or the database, it’s important to figure out what actually went wrong with it.

Therefore, as a DevOps/SRE/SysAdmin, it’s a good idea to have logs and other important files available as a backup somewhere safe before the VM gets terminated completely.

One of the ways to approach this is to use the lifecycle hooks of an EC2 to trigger the action of transferring log files from the terminating VM to an S3 bucket.

In this article, I am going to talk about the various configurations and steps to be followed in order to make this idea possible.

After spending some time with this problem and trying out different things for it, I am confident that it will be a good project for anyone looking to learn about LifeCycle hooks inside AWS.

Just so we are on the same page,

SSM = Systems Manager.

ASG = Auto Scaling Group.

Steps for?setup

The diagram above displays the architecture that we would be creating.

The steps, in brief, would be as follows:

Create instance roles for providing access SSM access to EC2s.
Creating a launch configuration for launching the EC2s.
Create an ASG to spin up EC2s.
Create a ‘termination’ lifecycle hook for the ASG.
Create a Lambda function
Create an event to trigger the Lambda function using Eventbridge.
Create an S3 bucket where logs would be stored.
Create an SSM document to run a shell script that does S3 operations inside the EC2.
Update EC2 instance IAM role
Configure Lambda function for listening to Eventbridge events & run SSM document.

Step 1

In order for SSM to be able to run commands on an EC2, it requires the EC2 to have a role with some specific permissions. It can have other permissions as well, but the permission “AmazonSSMManagedInstanceCore” is compulsory.

So, let’s create one.

Since this role would be applied on an EC2, it must have “EC2” as its trusted entity.

Let’s name the role as “ssm-ec2”.

Step 2

Since our goal is to deal with the lifecycle hooks and s3 operations, we can set up a bare minimum launch configuration. I used the following configurations, you may change this according to your use case.

AMI: ami-04505e74c0741db8d (It’s ubuntu 20 in us-east-1 region)
Instance Type: t2.micro
IAM Instance Profile: “ssm-ec2” (It’s the IAM role we created in Step 1)
No extra storage volumes.
Security group with 22 open for the world.
(I usually create a keypair file as well, in case, I need to SSH and debug something)

In case, you are a beginner trying to create a project like this — I suggest you follow along with the same configuration as mine.

To do the S3 operations, the plan is to do it via the aws-cli. Let’s add the installation steps of aws-cli to the launch configuration so that it’s already installed when a new EC2 is launched.

#!/bin/bash
cd /home/ubuntu/
sudo apt update -y
sudo apt install unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Add this script under “advanced details” -> “userdata”.

Let’s name the launch configuration “webapp-lc” & create it.

Step 3

Let’s create an ASG with the newly created launch configurations. There’s no requirement of a load balancer and instance count can be kept to a minimum of 1.

Here are the configurations that I used -

Name the ASG as “webapp-asg” & create it.

Step 4

Before creating a lifecycle hook, it’s important to understand what they are & how they work.

What are lifecycle hooks?

Lifecycle hooks let you pause an instance to perform custom actions whenever an EC2 starts or is terminated.

How does it work?

In order to understand it’s working, let’s first try to understand the behavior of EC2s without a lifecycle hook.

The diagrams below depict the behaviors. It’s pretty straightforward.

Now, let’s see how lifecycle hooks intercept the states to run user-defined custom actions.

From the diagrams, you can make out that lifecycle hooks intercept the states to run custom actions. After the actions are run, they expect a command to be invoked which instructs the EC2 to move to the next state.

Command

The command looks like this:

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id i-1a2b3c4d --lifecycle-hook-name my-launch-hook --auto-scaling-group-name my-asg

Heartbeat timeout

The amount of time, in seconds, for the instances to remain in a wait state.

In case this command is not run, the lifecycle would remain paused for this many seconds & then proceed ahead.

领英推荐

5 Ways to Reduce AWS EKS Compute Costs

Arkose Labs 1 年前

What You Should Know About the AWS Instances - NareshIT

Naresh i Technologies 2 年前

Beginner’s Guide To Amazon EC2 Instance Pricing

Awsome LLC 2 年前

Creation

We know pretty much everything about lifecycle hooks. Let’s create one now by going to ASG -> Instance Management.

Since we want to take custom action (of copying log files to the S3 bucket) at termination, we will create a “termination” lifecycle hook.

Let’s name it “e” (to signify “ending”).

Step 5

Let’s create a Lambda function with the following settings. Let’s name it “ec2-lifecycle”.

We in the future steps, I will add some more permissions to the IAM role of this lambda function.

Step 6

What is eventbridge?

EventBridge is a serverless event bus that makes it easier to build event-driven applications using events generated by other AWS services.

What’s an eventbridge rule?

An EventBridge rule watches for certain events and then routes them to AWS targets that you choose. You can create a rule that performs an AWS action automatically when another AWS action happens.

Creation

Let’s create a rule “eventbridge-ec2” that watches out for all EC2 terminate events. The below configurations are appropriate for our use case.

Step 7

Create an S3 bucket where the logs would be stored when instances terminate.

I had named my bucket “shishir-personal-04-b”.

Step 8

In this step, we are going to create an SSM document to run a shell script on the EC2. The goal is to copy the log files to an S3 bucket.

Let’s start creating the document by going to AWS-SystemManager -> Document -> Create Document -> Command option.

Parameters

Let’s define some parameters since we are going to trigger this SSM document via Lambda functions. These parameters would be used to execute the lifecycle hooks complete command as discussed in step 4.

hookname: Name of the lifecycle hook. “e” in our case.
asgname: Name of the ASG. “webapp-asg” in our case.
instanceid: ID of the instance for which the command needs to be executed. This would be present in the eventbridge event and would be passed to the SSM document via the Lambda function.

Commands

Find command can be used to filter out the log files that need to be copied to the S3 bucket created in step 7.

for i in `find /var/log -maxdepth 1 -type f -name '*.log'`; do echo $i; /usr/local/bin/aws s3 cp $i s3://shishir-personal-04-b/; done

ASG’s complete lifecycle action command will be using the defined parameters -

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id {{ instanceid }} --lifecycle-hook-name {{ hookname }} --auto-scaling-group-name {{ asgname }}

Document

The document would look something like this — let’s save it as “ssm-poc”.

{
  "schemaVersion": "2.2",
  "description": "Command Document Example JSON Template",
  "parameters": {
    "hookname": {
      "type": "String",
      "description": "hook_name",
      "default": "hook_name"
    },
    "asgname": {
      "type": "String",
      "description": "asg_name",
      "default": "asg_name"
    },
    "instanceid": {
      "type": "String",
      "description": "instance id",
      "default": "none"
    }
  },
  "mainSteps": [
    {
      "action": "aws:runShellScript",
      "name": "example",
      "inputs": {
        "runCommand": [
          "#!/bin/bash",
          "for i in `find /var/log -maxdepth 1 -type f -name '*.log'`; do echo $i; /usr/local/bin/aws s3 cp $i s3://shishir-personal-04-b/; done",
          "aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id {{ instanceid }} --lifecycle-hook-name {{ hookname }} --auto-scaling-group-name {{ asgname }}"
        ]
      }
    }
  ]
}

Step 9

Since the SSM document’s commands run on the EC2 and the command involves S3, ASG actions - It’s important to add permissions for these operations to the EC2 instance IAM role we created in Step 1.

Step 10

Until this point, our Lambda function has a triggered configured but no action. We need to write code that extracts data such as instance-id and run the SSM document that we created in Step 8.

I will be using python along with the boto3 library to invoke the ssm document from the lambda function. The below code extracts the data and invokes the command using the ssm_client object. It also prints the response from SSM.

import json
import boto3
import time
ssm_client = boto3.client('ssm')
def lambda_handler(event, context):
    ec2_instance=event['detail']['EC2InstanceId']
    document_name='ssm-poc'
    document_version='1'
    response = ssm_client.send_command(InstanceIds=[ec2_instance],DocumentName=document_name,DocumentVersion=document_version,TimeoutSeconds=300,Parameters={'hookname': ['e'], 'asgname': ['webapp-as'], 'instanceid': [ec2_instance]})
    
    command_id = response['Command']['CommandId']
    time.sleep(5)
    
    output = ssm_client.get_command_invocation(
       CommandId=command_id,
       InstanceId=ec2_instance
    )
    print(output)
    
    return {
        'statusCode': 200,
        'body': json.dumps(output)
    }

But this isn’t enough. Currently, lambda has only the basic permissions. We need to give more permissions to lambda.

To do so, go to the “Configurations” of the function and then go to the IAM role. We need to add “AmazonSSMFullAccess” permissions to this role so that it can trigger the SSM document.

That’s it. The setup is complete.

Steps for?testing

Set the instance count in the ASG to 0 to terminate the event. This should create an event bridge event, which would be sent to the Lambda function where the SSM document would be invoked.
To check the state of the SSM document invocation, go to “Systems Manager” -> “Run Command” -> “Command History”.
In case it shows a failure, check the output to get some hint about what is going wrong. If you get stuck, leave a comment on this post.

Verification

To verify, check out the contents of the S3 bucket.

Final Words

If you found this post helpful & knowledgeable, be sure to follow & leave lots of ???? good reviews ???? It encourages me to keep writing and helps other people in finding it?:)

I share tips, experiences & articles on my Medium as well. You’ll love it if you are into Cloud, DevOps, Kubernetes, Integrations, etc. Follow me on Medium - https://shishirkh.medium.com/

DevOps Copilot

4,736 位关注者

Nidal Shater

2 年

Amazing article thanks, it was really helpful, just one note when defining the EventBridge event pattern we should determine the detail-type and the ARN of the AutoScalingGroup { ?"source": ["aws.autoscaling"], ?"detail-type": ["EC2 Instance-terminate Lifecycle Action"], ?"resources": [ ??"AutoScalingGroupARN" ?] } so it won't be running on every action for every AutoScalingGroup

Noah Jacob Suresh

YOU LIFT ME BY YOUR GRACE

3 年

Hey Shishir Khandelwal I just now finished the exercise it was so well documented, I used AMZN LINUX 2 instead of Ubuntu, as ubuntu had some issues registering with SSM

Sarvajit Sankar

DevOps @ Atlys | ex-SRE @ CRED

3 年

Hey Shishir Khandelwal , it could be that I may be lacking some knowledge here, but why not use cloudwatch agent and ship all the application logs to cloudwatch itself? Or rather, any log shipping tool to ship it to any centralized location, since it is always a best practice! Is it not?

2 次回应

Kumar E

ETL Consultant - DW & BI | Informatica Powercenter | Mulesoft l AWS Solution Architect Associate Certified

3 年

Very well documented Shishir Khandelwal.. Thanks for the approach..

1 次回应

查看更多评论

要查看或添加评论，请登录

Shishir Khandelwal的更多文章

Navigating API Gateway Choices: A Practical Q&A on AWS API Gateway vs. Kong

2024年7月22日

Navigating API Gateway Choices: A Practical Q&A on AWS API Gateway vs. Kong

Introduction This article is based on an indirect conversation I had with a startup's Head of Engineering while they…

8 条评论
5 Crucial Tips for a Startup Cloud Infrastructure

2024年7月20日

5 Crucial Tips for a Startup Cloud Infrastructure

Working at a startup has been a whirlwind of learning. When you're the first creator and owner of a critical part of…
Creating Validation Admission Webhooks Inside Kubernetes

2023年7月1日

Creating Validation Admission Webhooks Inside Kubernetes

This is the second part of a series of articles discussing Admissions Hooks in Kubernetes. Check out the first article…

3 条评论
The Ultimate Guide To Admission Hooks in Kubernetes

2023年1月1日

The Ultimate Guide To Admission Hooks in Kubernetes

Inside Kubernetes, even the simplest task such as — the ‘Creation of a pod’ involves a lot of steps. Understanding…

1 条评论
Hosting a webpage over custom domain & ssl

2022年8月10日

Hosting a webpage over custom domain & ssl

In this article, we will see the setup of the Domain name, Route53 and Certificate Manager. The main component of the…

3 条评论
Top Kubernetes Commands To Work Faster

2022年7月26日

Top Kubernetes Commands To Work Faster

Kubernetes's kubectl can create objects in two ways - Declarative Used for creating resources from manifest files using…

5 条评论
Automating Route53 record creations

2022年7月10日

Automating Route53 record creations

Kubernetes clusters use an Ingress Controller to expose applications to the outside world. For each endpoint or path…

8 条评论
Understanding Public Key Infrastructure

2022年6月24日

Understanding Public Key Infrastructure

Public Key Infrastructure How does a client on the internet communicate with a server on the internet? Is this…

2 条评论
Using Envconsul with Vault

2022年5月26日

Using Envconsul with Vault

In order to use & keep sensitive values safe — we require two things A place where sensitive information can be stored…

4 条评论
Understanding Elasticsearch

2022年3月21日

Understanding Elasticsearch

Understanding the use case The format in which data is stored inside traditional databases like Postgres, Cassandra, or…

4 条评论

See all articles

Saving EC2 logs on S3 buckets via lifecycle hooks

Shishir Khandelwal

Platform Engineering

Steps for?setup

领英推荐

Steps for?testing

DevOps Copilot

4,736 位关注者

Shishir Khandelwal的更多文章

社区洞察

其他会员也浏览了

Configure an EC2 Instance Behavior in AWS Cloud

5 Optimisation Strategies and Tools for Cost Management in AWS Cloud

AWS EC2 Instance Types and Use Cases

The Beginner's Guide to AWS EC2 Instances!

RISE gets an AI upgrade from Amazon EC2 U7i

Spot for Ephemeral EC2 Instances

AWS Questions:

Load balancing EC2 Instances in an Autoscaling Group

Day - 04 | EC2 Instance Storage | AWS Cloud Practitioner Certification CLF-C02

Architecting Applications on Amazon EC2

Steps for?setup

领英推荐

Steps for?testing

DevOps Copilot

4,736 位关注者

Shishir Khandelwal的更多文章

Navigating API Gateway Choices: A Practical Q&A on AWS API Gateway vs. Kong

5 Crucial Tips for a Startup Cloud Infrastructure

Creating Validation Admission Webhooks Inside Kubernetes

The Ultimate Guide To Admission Hooks in Kubernetes

Hosting a webpage over custom domain & ssl

Top Kubernetes Commands To Work Faster

Automating Route53 record creations

Understanding Public Key Infrastructure

Using Envconsul with Vault

Understanding Elasticsearch

社区洞察

其他会员也浏览了

Configure an EC2 Instance Behavior in AWS Cloud

5 Optimisation Strategies and Tools for Cost Management in AWS Cloud

AWS EC2 Instance Types and Use Cases

The Beginner's Guide to AWS EC2 Instances!

RISE gets an AI upgrade from Amazon EC2 U7i

Spot for Ephemeral EC2 Instances

AWS Questions:

Load balancing EC2 Instances in an Autoscaling Group

Day - 04 | EC2 Instance Storage | AWS Cloud Practitioner Certification CLF-C02

Architecting Applications on Amazon EC2