Saving EC2 logs on S3 buckets via lifecycle hooks

Saving EC2 logs on S3 buckets via lifecycle hooks

No alt text provided for this image

An EC2 may get terminated for a variety of reasons. And since it usually runs something crucial for the applications such as frontend, backend, or the database, it’s important to figure out what actually went wrong with it.

Therefore, as a DevOps/SRE/SysAdmin, it’s a good idea to have logs and other important files available as a backup somewhere safe before the VM gets terminated completely.

One of the ways to approach this is to use the lifecycle hooks of an EC2 to trigger the action of transferring log files from the terminating VM to an S3 bucket.

In this article, I am going to talk about the various configurations and steps to be followed in order to make this idea possible.

After spending some time with this problem and trying out different things for it, I am confident that it will be a good project for anyone looking to learn about LifeCycle hooks inside AWS.

Just so we are on the same page,

SSM = Systems Manager.

ASG = Auto Scaling Group.

Steps for?setup

The diagram above displays the architecture that we would be creating.

The steps, in brief, would be as follows:

  1. Create instance roles for providing access SSM access to EC2s.
  2. Creating a launch configuration for launching the EC2s.
  3. Create an ASG to spin up EC2s.
  4. Create a ‘termination’ lifecycle hook for the ASG.
  5. Create a Lambda function
  6. Create an event to trigger the Lambda function using Eventbridge.
  7. Create an S3 bucket where logs would be stored.
  8. Create an SSM document to run a shell script that does S3 operations inside the EC2.
  9. Update EC2 instance IAM role
  10. Configure Lambda function for listening to Eventbridge events & run SSM document.

Step 1

In order for SSM to be able to run commands on an EC2, it requires the EC2 to have a role with some specific permissions. It can have other permissions as well, but the permission “AmazonSSMManagedInstanceCore” is compulsory.

So, let’s create one.

Since this role would be applied on an EC2, it must have “EC2” as its trusted entity.

Let’s name the role as “ssm-ec2”.

No alt text provided for this image

Step 2

Since our goal is to deal with the lifecycle hooks and s3 operations, we can set up a bare minimum launch configuration. I used the following configurations, you may change this according to your use case.

AMI: ami-04505e74c0741db8d (It’s ubuntu 20 in us-east-1 region)
Instance Type: t2.micro
IAM Instance Profile: “ssm-ec2” (It’s the IAM role we created in Step 1)
No extra storage volumes.
Security group with 22 open for the world.
(I usually create a keypair file as well, in case, I need to SSH and debug something)        

In case, you are a beginner trying to create a project like this — I suggest you follow along with the same configuration as mine.

To do the S3 operations, the plan is to do it via the aws-cli. Let’s add the installation steps of aws-cli to the launch configuration so that it’s already installed when a new EC2 is launched.

#!/bin/bash
cd /home/ubuntu/
sudo apt update -y
sudo apt install unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install        

Add this script under “advanced details” -> “userdata”.

Let’s name the launch configuration “webapp-lc” & create it.

Step 3

Let’s create an ASG with the newly created launch configurations. There’s no requirement of a load balancer and instance count can be kept to a minimum of 1.

Here are the configurations that I used -

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Name the ASG as “webapp-asg” & create it.

Step 4

Before creating a lifecycle hook, it’s important to understand what they are & how they work.

What are lifecycle hooks?

Lifecycle hooks let you pause an instance to perform custom actions whenever an EC2 starts or is terminated.

How does it work?

In order to understand it’s working, let’s first try to understand the behavior of EC2s without a lifecycle hook.

The diagrams below depict the behaviors. It’s pretty straightforward.

No alt text provided for this image
No alt text provided for this image

Now, let’s see how lifecycle hooks intercept the states to run user-defined custom actions.

No alt text provided for this image
No alt text provided for this image

From the diagrams, you can make out that lifecycle hooks intercept the states to run custom actions. After the actions are run, they expect a command to be invoked which instructs the EC2 to move to the next state.

Command

The command looks like this:

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id i-1a2b3c4d --lifecycle-hook-name my-launch-hook --auto-scaling-group-name my-asg        

Heartbeat timeout

The amount of time, in seconds, for the instances to remain in a wait state.

In case this command is not run, the lifecycle would remain paused for this many seconds & then proceed ahead.

Creation

We know pretty much everything about lifecycle hooks. Let’s create one now by going to ASG -> Instance Management.

Since we want to take custom action (of copying log files to the S3 bucket) at termination, we will create a “termination” lifecycle hook.

Let’s name it “e” (to signify “ending”).

No alt text provided for this image

Step 5

Let’s create a Lambda function with the following settings. Let’s name it “ec2-lifecycle”.

We in the future steps, I will add some more permissions to the IAM role of this lambda function.

No alt text provided for this image

Step 6

What is eventbridge?

EventBridge is a serverless event bus that makes it easier to build event-driven applications using events generated by other AWS services.

What’s an eventbridge rule?

An EventBridge rule watches for certain events and then routes them to AWS targets that you choose. You can create a rule that performs an AWS action automatically when another AWS action happens.

Creation

Let’s create a rule “eventbridge-ec2” that watches out for all EC2 terminate events. The below configurations are appropriate for our use case.

No alt text provided for this image
No alt text provided for this image

Step 7

Create an S3 bucket where the logs would be stored when instances terminate.

I had named my bucket “shishir-personal-04-b”.

Step 8

In this step, we are going to create an SSM document to run a shell script on the EC2. The goal is to copy the log files to an S3 bucket.

Let’s start creating the document by going to AWS-SystemManager -> Document -> Create Document -> Command option.

Parameters

Let’s define some parameters since we are going to trigger this SSM document via Lambda functions. These parameters would be used to execute the lifecycle hooks complete command as discussed in step 4.

  • hookname: Name of the lifecycle hook. “e” in our case.
  • asgname: Name of the ASG. “webapp-asg” in our case.
  • instanceid: ID of the instance for which the command needs to be executed. This would be present in the eventbridge event and would be passed to the SSM document via the Lambda function.

Commands

Find command can be used to filter out the log files that need to be copied to the S3 bucket created in step 7.

for i in `find /var/log -maxdepth 1 -type f -name '*.log'`; do echo $i; /usr/local/bin/aws s3 cp $i s3://shishir-personal-04-b/; done        

ASG’s complete lifecycle action command will be using the defined parameters -

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id {{ instanceid }} --lifecycle-hook-name {{ hookname }} --auto-scaling-group-name {{ asgname }}        

Document

The document would look something like this — let’s save it as “ssm-poc”.

{
  "schemaVersion": "2.2",
  "description": "Command Document Example JSON Template",
  "parameters": {
    "hookname": {
      "type": "String",
      "description": "hook_name",
      "default": "hook_name"
    },
    "asgname": {
      "type": "String",
      "description": "asg_name",
      "default": "asg_name"
    },
    "instanceid": {
      "type": "String",
      "description": "instance id",
      "default": "none"
    }
  },
  "mainSteps": [
    {
      "action": "aws:runShellScript",
      "name": "example",
      "inputs": {
        "runCommand": [
          "#!/bin/bash",
          "for i in `find /var/log -maxdepth 1 -type f -name '*.log'`; do echo $i; /usr/local/bin/aws s3 cp $i s3://shishir-personal-04-b/; done",
          "aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id {{ instanceid }} --lifecycle-hook-name {{ hookname }} --auto-scaling-group-name {{ asgname }}"
        ]
      }
    }
  ]
}        

Step 9

Since the SSM document’s commands run on the EC2 and the command involves S3, ASG actions - It’s important to add permissions for these operations to the EC2 instance IAM role we created in Step 1.

No alt text provided for this image

Step 10

Until this point, our Lambda function has a triggered configured but no action. We need to write code that extracts data such as instance-id and run the SSM document that we created in Step 8.

I will be using python along with the boto3 library to invoke the ssm document from the lambda function. The below code extracts the data and invokes the command using the ssm_client object. It also prints the response from SSM.

import json
import boto3
import time
ssm_client = boto3.client('ssm')
def lambda_handler(event, context):
    ec2_instance=event['detail']['EC2InstanceId']
    document_name='ssm-poc'
    document_version='1'
    response = ssm_client.send_command(InstanceIds=[ec2_instance],DocumentName=document_name,DocumentVersion=document_version,TimeoutSeconds=300,Parameters={'hookname': ['e'], 'asgname': ['webapp-as'], 'instanceid': [ec2_instance]})
    
    command_id = response['Command']['CommandId']
    time.sleep(5)
    
    output = ssm_client.get_command_invocation(
       CommandId=command_id,
       InstanceId=ec2_instance
    )
    print(output)
    
    return {
        'statusCode': 200,
        'body': json.dumps(output)
    }        

But this isn’t enough. Currently, lambda has only the basic permissions. We need to give more permissions to lambda.

To do so, go to the “Configurations” of the function and then go to the IAM role. We need to add “AmazonSSMFullAccess” permissions to this role so that it can trigger the SSM document.

That’s it. The setup is complete.

Steps for?testing

  • Set the instance count in the ASG to 0 to terminate the event. This should create an event bridge event, which would be sent to the Lambda function where the SSM document would be invoked.
  • To check the state of the SSM document invocation, go to “Systems Manager” -> “Run Command” -> “Command History”.
  • In case it shows a failure, check the output to get some hint about what is going wrong. If you get stuck, leave a comment on this post.

Verification

To verify, check out the contents of the S3 bucket.

Final Words

If you found this post helpful & knowledgeable, be sure to follow & leave lots of ???? good reviews ???? It encourages me to keep writing and helps other people in finding it?:)

I share tips, experiences & articles on my Medium as well. You’ll love it if you are into Cloud, DevOps, Kubernetes, Integrations, etc. Follow me on Medium - https://shishirkh.medium.com/

Nidal Shater

DevOps Engineer at Moove | AWS Expert | Terraform | CICD | System Design Innovator | Automating manual processes

2 年

Amazing article thanks, it was really helpful, just one note when defining the EventBridge event pattern we should determine the detail-type and the ARN of the AutoScalingGroup { ?"source": ["aws.autoscaling"], ?"detail-type": ["EC2 Instance-terminate Lifecycle Action"], ?"resources": [ ??"AutoScalingGroupARN" ?] } so it won't be running on every action for every AutoScalingGroup

回复
Noah Jacob Suresh

YOU LIFT ME BY YOUR GRACE

3 年

Hey Shishir Khandelwal I just now finished the exercise it was so well documented, I used AMZN LINUX 2 instead of Ubuntu, as ubuntu had some issues registering with SSM

回复
Sarvajit Sankar

DevOps @ Atlys | ex-SRE @ CRED

3 年

Hey Shishir Khandelwal , it could be that I may be lacking some knowledge here, but why not use cloudwatch agent and ship all the application logs to cloudwatch itself? Or rather, any log shipping tool to ship it to any centralized location, since it is always a best practice! Is it not?

Kumar E

ETL Consultant - DW & BI | Informatica Powercenter | Mulesoft l AWS Solution Architect Associate Certified

3 年

Very well documented Shishir Khandelwal.. Thanks for the approach..

要查看或添加评论,请登录

Shishir Khandelwal的更多文章

  • Navigating API Gateway Choices: A Practical Q&A on AWS API Gateway vs. Kong

    Navigating API Gateway Choices: A Practical Q&A on AWS API Gateway vs. Kong

    Introduction This article is based on an indirect conversation I had with a startup's Head of Engineering while they…

    8 条评论
  • 5 Crucial Tips for a Startup Cloud Infrastructure

    5 Crucial Tips for a Startup Cloud Infrastructure

    Working at a startup has been a whirlwind of learning. When you're the first creator and owner of a critical part of…

  • Creating Validation Admission Webhooks Inside Kubernetes

    Creating Validation Admission Webhooks Inside Kubernetes

    This is the second part of a series of articles discussing Admissions Hooks in Kubernetes. Check out the first article…

    3 条评论
  • The Ultimate Guide To Admission Hooks in Kubernetes

    The Ultimate Guide To Admission Hooks in Kubernetes

    Inside Kubernetes, even the simplest task such as — the ‘Creation of a pod’ involves a lot of steps. Understanding…

    1 条评论
  • Hosting a webpage over custom domain & ssl

    Hosting a webpage over custom domain & ssl

    In this article, we will see the setup of the Domain name, Route53 and Certificate Manager. The main component of the…

    3 条评论
  • Top Kubernetes Commands To Work Faster

    Top Kubernetes Commands To Work Faster

    Kubernetes's kubectl can create objects in two ways - Declarative Used for creating resources from manifest files using…

    5 条评论
  • Automating Route53 record creations

    Automating Route53 record creations

    Kubernetes clusters use an Ingress Controller to expose applications to the outside world. For each endpoint or path…

    8 条评论
  • Understanding Public Key Infrastructure

    Understanding Public Key Infrastructure

    Public Key Infrastructure How does a client on the internet communicate with a server on the internet? Is this…

    2 条评论
  • Using Envconsul with Vault

    Using Envconsul with Vault

    In order to use & keep sensitive values safe — we require two things A place where sensitive information can be stored…

    4 条评论
  • Understanding Elasticsearch

    Understanding Elasticsearch

    Understanding the use case The format in which data is stored inside traditional databases like Postgres, Cassandra, or…

    4 条评论

社区洞察

其他会员也浏览了