Modular Deployments in AWS Cross-Account CI/CD Pipelines
When deploying code changes to production, how do you avoid "re-building" the entire solution and instead only build, test, and deploy the specific component(s) that changed?
This is important because we do not want to re-build, re-test, and re-deploy an entire infrastructure every time a commit triggers our CI/CD pipeline, especially when deploying changes frequently, in small batches.
Suppose you built a serverless ML training pipeline using Glue jobs, Lambda functions, and Step Functions to orchestrate their sequential execution. These components encapsulate:
Let's walk through the solution architecture diagram starting at the top left.
During a sprint, you complete a task that involves updating the Training Lambda function's handler code (i.e. code that submits a SageMaker Hyperparameter Tuning job).
Within the SageMaker Studio UI, you commit your code changes into a short-lived CodeCommit feature branch, pull request to the dev branch, await code review, and merge into dev.
This merge-into-dev event triggers an EventBridge custom Rule, with a Lambda function as its invocation target. This Lambda function leverages boto3 to pull the latest commitId's "diffs" (the files that have changed in the dev branch relative to the test branch).
These diffs will determine which specific component(s) will be built, tested, and deployed - in this case the Training Lambda function.
Since we are going to need this information downstream in our CI/CD pipeline, specifically during CodeBuild, we store the diffs in a "Deployment-Diffs" DynamoDB table. This table requires a composite primary key, where the partition key is the CommitId from CodeCommit, and the sort key is the ComponentName (i.e. Training-Lambda-Function).
This composite primary key guarantees each insert into the diffs table is unique, while maintaining diffs history per component. If we had avoided a sort key, we would have had to sacrifice either the diffs history or determine which individual component(s) changed (since each item write would overwrite the previous one; DynamoDB does not allow duplicate primary keys). We choose on-demand provisioning for both reads and writes to remain fully serverless.
This write to DynamoDB takes place within the Lambda function invoked by EventBridge, in response to the merge event from the feature branch into the dev branch. Based on the diffs, we use boto3 to start execution of CodePipeline.
CodePipeline is composed of 4 stages:
The Source stage receives a copy of the CodeCommit repository's test branch. The entire source code in the branch is passed as input throughout CodePipeline.
The Build & Test stage launches a CodeBuild Amazon Linux container, where the code to be executed is read from the copy of the repo's test branch. We specify the full path to the buildspec.yml code file as part of CodeBuild project's configuration.
领英推荐
This buildspec.yml file leverages the AWS CLI to perform a sequence of commands. Make sure to enable the "privileged" flag in the CodeBuild configuration to allow building Docker images. We only show the Training Lambda function for simplicity, but our buildspec.yml file includes all training pipeline components.
First, read diffs from DynamoDB and flip boolean flags from 0 to 1 conditionally based on the specific component(s) that changed
pre_build:
? commands:
- aws ecr get-login-password --region $YOUR_REGION | docker login --username AWS --password-stdin?$YOUR_AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com
- CF_TEMPLATE_DIFF=0
- TEST_BUILDSPEC_DIFF=0
- PROD_BUILDSPEC_DIFF=0
- TRAINING_LAMBDA_DIFF=0
# Add any additional flags, one for each distinct solution component
- COMMIT_ID=$(aws codecommit get-branch --repository-name REPO_NAME --branch-name dev --query 'branch.commitId')
- COMMIT_ID="${COMMIT_ID%\"}"
- COMMIT_ID="${COMMIT_ID#\"}"
- DIFFS=$(aws dynamodb query --table-name Deployment-Diffs --projection-expression "Component" --key-condition-expression "CommitId = :value" --expression-attribute-values '{":value":{"S":'\"$COMMIT_ID\"'}}' --query 'Items[*].Component.S')
- echo $DIFFS
- |
? for COMPONENT in $DIFFS; do
? ? # Remove comma suffix
? ? COMPONENT="${COMPONENT%,}"
? ? # Remove quote suffix
? ? COMPONENT="${COMPONENT%\"}"
? ? # Remove quote prefix
? ? COMPONENT="${COMPONENT#\"}"
? ? ? ? ?
? ? if [ $COMPONENT == Training-Lambda-Function ]; then
? ? ? echo "Found a diff for Training-Lambda-Function"
? ? ? TRAINING_LAMBDA_DIFF=1
? ? elif [ $COMPONENT == template.yml ]; then
? ? ? echo "Found a diff for template.yml"
? ? ? CF_TEMPLATE_DIFF=1
? ? elif [ $COMPONENT == test-buildspec.yml ]; then
? ? ? echo "Found a diff for test-buildspec.yml"
? ? ? TEST_BUILDSPEC_DIFF=1
elif [ $COMPONENT == prod-buildspec.yml ]; then
? ? ? echo "Found a diff for prod-buildspec.yml"
? ? ? PROD_BUILDSPEC_DIFF=1
? ? else
? ? ? continue
? ? fi
? done
Next, read the Training Lambda function's current [production] version from a separate DynamoDB deployments metadata table. This table is also configured with a composite primary key, where the partition key is the Component name and the sort key is the Build number (note that Component name is the partition key in this table, while in the deployment diffs table it's the sort key; each use case is different).
If the component changed, we auto-increment the Lambda function version by 1, build the code (from the input parameter repo) into a new Docker image, push to ECR, and store the new ECR image URI in a local variable (we will cover individual component testing in a separate article):
build:
? commands:
? ? ? - TIMESTAMP=$(date)
? ? ? - AUTO_INCREMENT=1
? ? ? - ECR_REPO=YOUR_ECR_REPO
? ? ? - AWS_ACCOUNT=YOUR_AWS_ACCOUNT
? ? ? - DEPLOYMENTS_BUCKET=YOUR_S3_BUCKET
? ? ? - DEPLOYMENTS_TABLE=YOUR_DYNAMODB_TABLE
? ? ? - STACK_NAME=YOUR_CF_STACK
? ? ?
? ? ?
? ? ? - echo $TRAINING_LAMBDA_DIFF
? ? ? - TRAINING_LAMBDA_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"Training-Lambda-Function"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? - TRAINING_LAMBDA_VERSION=$(echo $TRAINING_LAMBDA_VERSION | cut -d'"' -f 2)
? ? ? - echo $TRAINING_LAMBDA_VERSION
? ? ? - |
? ? ? ? if [ $TRAINING_LAMBDA_DIFF == 1 ]; then
? ? ? ? ? echo "Deploying new version of Training-Lambda-Function..."
? ? ? ? ? TRAINING_LAMBDA_NEW_VERSION=$(($TRAINING_LAMBDA_VERSION + $AUTO_INCREMENT))
? ? ? ? ? TRAINING_LAMBDA_ECR_NAME=training-lambda-function
? ? ? ? ? docker build -t $TRAINING_LAMBDA_ECR_NAME FULL_REPO_PATH/Training-Lambda-Function/.
? ? ? ? ? docker tag $TRAINING_LAMBDA_ECR_NAME $AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
? ? ? ? ? docker push $AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
? ? ? ? ? TRAINING_IMAGE_URI=$AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
? ? ? ? else
? ? ? ? ? echo "No new version of Training-Lambda-Function..."
? ? ? ? ? TRAINING_IMAGE_URI=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --select ALL_ATTRIBUTES --key-condition-expression "Component = :name AND BuildNumber = :build" --expression-attribute-values "{\":name\":{\"S\":\"Training-Lambda-Function\"},\":build\":{\"N\":\"$TRAINING_LAMBDA_VERSION\"}}" --max-items 1 --query 'Items[*].Location.S')
? ? ? ? ? TRAINING_IMAGE_URI=$(echo $TRAINING_IMAGE_URI | cut -d'"' -f 2)
? ? ? ? ? echo $TRAINING_IMAGE_URI
? ? ? ? fi
Suppose this Lambda function had not changed. Why do we still read the current production version number (outside of the conditional statement), and why do we have an else statement to pull the current production Lambda's ECR image URI path?
Because in the post_build phase, CloudFormation deployment requires the code location of each component in --parameter-overrides, whether a given component changed or not (see code below). Therefore, we need the ability to pull each component's current version and code location, while also allowing new, individual component, modularized deployments.
Finally, if CloudFormation deployment succeeds, we write the deployment metadata to DynamoDB:
post_build
? commands:
? ? - aws cloudformation package --template-file FULL_REPO_PATH/template.yml --output-template-file template-package.yml --s3-bucket $DEPLOYMENTS_BUCKET
? ? ?
? ? - aws cloudformation deploy --template-file template-package.yml --stack-name $STACK_NAME --parameter-overrides TrainingLambdaImageUri=$TRAINING_IMAGE_URI --capabilities CAPABILITY_NAMED_IAM --no-fail-on-empty-changeset
? ? ?
? ? - |
? ? ? if [ $TRAINING_LAMBDA_DIFF == 1 ]; then
? ? ? echo "Writing Training-Lambda-Function metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"Training-Lambda-Function\"},\"BuildNumber\":{\"N\":\"$TRAINING_LAMBDA_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"$TRAINING_IMAGE_URI\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
? ? ?
? ? - echo $CF_TEMPLATE_DIFF
? ? - |
? ? ? if [ $CF_TEMPLATE_DIFF == 1 ]; then
? ? ? ? CF_TEMPLATE_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"template.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? ? CF_TEMPLATE_VERSION=$(echo $CF_TEMPLATE_VERSION | cut -d'"' -f 2)
? ? ? ? echo $CF_TEMPLATE_VERSION
? ? ? ? CF_TEMPLATE_NEW_VERSION=$(($CF_TEMPLATE_VERSION + $AUTO_INCREMENT))
? ? ? ? echo "Writing template.yml metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"D2A.yml\"},\"BuildNumber\":{\"N\":\"$CF_TEMPLATE_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/template.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
? ? ?
? ? - echo $PROD_BUILDSPEC_DIFF
? ? - |
? ? ? if [ $PROD_BUILDSPEC_DIFF == 1 ]; then
? ? ? ? PROD_BUILDSPEC_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"prod-buildspec.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? ? PROD_BUILDSPEC_VERSION=$(echo $PROD_BUILDSPEC_VERSION | cut -d'"' -f 2)
? ? ? ? echo $PROD_BUILDSPEC_VERSION
? ? ? ? PROD_BUILDSPEC_NEW_VERSION=$(($PROD_BUILDSPEC_VERSION + $AUTO_INCREMENT))
? ? ? ? echo "Writing prod-buildspec.yml metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"prod-buildspec.yml\"},\"BuildNumber\":{\"N\":\"$PROD_BUILDSPEC_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/prod-buildspec.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
? ? ?
? ? - echo $TEST_BUILDSPEC_DIFF
? ? - |
? ? ? if [ $TEST_BUILDSPEC_DIFF == 1 ]; then
? ? ? ? TEST_BUILDSPEC_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"test-buildspec.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? ? TEST_BUILDSPEC_VERSION=$(echo $TEST_BUILDSPEC_VERSION | cut -d'"' -f 2)
? ? ? ? echo $TEST_BUILDSPEC_VERSION
? ? ? ? TEST_BUILDSPEC_NEW_VERSION=$(($TEST_BUILDSPEC_VERSION + $AUTO_INCREMENT))
? ? ? ? echo "Writing test-buildspec.yml metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"test-buildspec.yml\"},\"BuildNumber\":{\"N\":\"$TEST_BUILDSPEC_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/test-buildspec.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
If everything succeeds, we proceed to cross-account deployment into our production AWS account. See "AWS Cross-Account Deployments" for details. We will expand on this topic in future articles.
How do you deploy changes from test to prod within AWS, modularized such that only the components that changed are updated? Let us know in the comments.
Subscribe to our weekly LinkedIn newsletter:?Machine Learning In Production
Reach out if you need help:
Would you like me to speak at your event? Email me at [email protected]
CEO | Tech Leader in AWS, SaaS, DevOps, Kubernetes, Terraform, Serverless and Cloud-Native Development
3 年Great read, Carlos! Thank you for sharing this with us.
Principal Machine Learning Engineer | AWS
3 年How do you deploy changes from test to prod, modularized such that only the components that changed are updated?