登录查看更多内容

Modular Deployments in AWS Cross-Account CI/CD Pipelines

Carlos Lara

Principal Machine Learning Engineer | AWS

发布日期: 2021年10月22日

When deploying code changes to production, how do you avoid "re-building" the entire solution and instead only build, test, and deploy the specific component(s) that changed?

This is important because we do not want to re-build, re-test, and re-deploy an entire infrastructure every time a commit triggers our CI/CD pipeline, especially when deploying changes frequently, in small batches.

Suppose you built a serverless ML training pipeline using Glue jobs, Lambda functions, and Step Functions to orchestrate their sequential execution. These components encapsulate:

Dataset Extraction
Data Validation
Data Pre-Processing & Feature Engineering
Training
Model Evaluation
Model Explanation

Let's walk through the solution architecture diagram starting at the top left.

During a sprint, you complete a task that involves updating the Training Lambda function's handler code (i.e. code that submits a SageMaker Hyperparameter Tuning job).

Within the SageMaker Studio UI, you commit your code changes into a short-lived CodeCommit feature branch, pull request to the dev branch, await code review, and merge into dev.

This merge-into-dev event triggers an EventBridge custom Rule, with a Lambda function as its invocation target. This Lambda function leverages boto3 to pull the latest commitId's "diffs" (the files that have changed in the dev branch relative to the test branch).

These diffs will determine which specific component(s) will be built, tested, and deployed - in this case the Training Lambda function.

Since we are going to need this information downstream in our CI/CD pipeline, specifically during CodeBuild, we store the diffs in a "Deployment-Diffs" DynamoDB table. This table requires a composite primary key, where the partition key is the CommitId from CodeCommit, and the sort key is the ComponentName (i.e. Training-Lambda-Function).

This composite primary key guarantees each insert into the diffs table is unique, while maintaining diffs history per component. If we had avoided a sort key, we would have had to sacrifice either the diffs history or determine which individual component(s) changed (since each item write would overwrite the previous one; DynamoDB does not allow duplicate primary keys). We choose on-demand provisioning for both reads and writes to remain fully serverless.

This write to DynamoDB takes place within the Lambda function invoked by EventBridge, in response to the merge event from the feature branch into the dev branch. Based on the diffs, we use boto3 to start execution of CodePipeline.

CodePipeline is composed of 4 stages:

Source (CodeCommit)
Build & Test (CodeBuild)
Cross-Account Deploy (CodeBuild)
Merge test branch into master branch

The Source stage receives a copy of the CodeCommit repository's test branch. The entire source code in the branch is passed as input throughout CodePipeline.

The Build & Test stage launches a CodeBuild Amazon Linux container, where the code to be executed is read from the copy of the repo's test branch. We specify the full path to the buildspec.yml code file as part of CodeBuild project's configuration.

Jon Bonso 10 个月前

Marvelous MLOps #51: MLOps with Databricks Roadmap &…

Marvelous MLOps 3 个月前

Architecture to AWS CloudFormation code using…

Bernard G 1 个月前

This buildspec.yml file leverages the AWS CLI to perform a sequence of commands. Make sure to enable the "privileged" flag in the CodeBuild configuration to allow building Docker images. We only show the Training Lambda function for simplicity, but our buildspec.yml file includes all training pipeline components.

First, read diffs from DynamoDB and flip boolean flags from 0 to 1 conditionally based on the specific component(s) that changed

pre_build:
? commands:
    - aws ecr get-login-password --region $YOUR_REGION | docker login --username AWS --password-stdin?$YOUR_AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com
    
    - CF_TEMPLATE_DIFF=0
    - TEST_BUILDSPEC_DIFF=0
    - PROD_BUILDSPEC_DIFF=0
    - TRAINING_LAMBDA_DIFF=0
    # Add any additional flags, one for each distinct solution component
    
    - COMMIT_ID=$(aws codecommit get-branch --repository-name REPO_NAME --branch-name dev --query 'branch.commitId')
    - COMMIT_ID="${COMMIT_ID%\"}"
    - COMMIT_ID="${COMMIT_ID#\"}"
    
    - DIFFS=$(aws dynamodb query --table-name Deployment-Diffs --projection-expression "Component" --key-condition-expression "CommitId = :value" --expression-attribute-values '{":value":{"S":'\"$COMMIT_ID\"'}}' --query 'Items[*].Component.S')
    
    - echo $DIFFS
    - |
    ? for COMPONENT in $DIFFS; do
    
    ? ? # Remove comma suffix
    ? ? COMPONENT="${COMPONENT%,}"
    ? ? # Remove quote suffix
    ? ? COMPONENT="${COMPONENT%\"}"
    ? ? # Remove quote prefix
    ? ? COMPONENT="${COMPONENT#\"}"
    ? ? ? ? ?
    ? ? if [ $COMPONENT == Training-Lambda-Function ]; then
    ? ? ? echo "Found a diff for Training-Lambda-Function"
    ? ? ? TRAINING_LAMBDA_DIFF=1
    
    ? ? elif [ $COMPONENT == template.yml ]; then
    ? ? ? echo "Found a diff for template.yml"
    ? ? ? CF_TEMPLATE_DIFF=1
    
    ? ? elif [ $COMPONENT == test-buildspec.yml ]; then
    ? ? ? echo "Found a diff for test-buildspec.yml"
    ? ? ? TEST_BUILDSPEC_DIFF=1
    
        elif [ $COMPONENT == prod-buildspec.yml ]; then
    ? ? ? echo "Found a diff for prod-buildspec.yml"
    ? ? ? PROD_BUILDSPEC_DIFF=1
    
    ? ? else
    ? ? ? continue
    ? ? fi
    ? done

Next, read the Training Lambda function's current [production] version from a separate DynamoDB deployments metadata table. This table is also configured with a composite primary key, where the partition key is the Component name and the sort key is the Build number (note that Component name is the partition key in this table, while in the deployment diffs table it's the sort key; each use case is different).

If the component changed, we auto-increment the Lambda function version by 1, build the code (from the input parameter repo) into a new Docker image, push to ECR, and store the new ECR image URI in a local variable (we will cover individual component testing in a separate article):


build:
? commands:
? ? ? - TIMESTAMP=$(date)
? ? ? - AUTO_INCREMENT=1
? ? ? - ECR_REPO=YOUR_ECR_REPO
? ? ? - AWS_ACCOUNT=YOUR_AWS_ACCOUNT
? ? ? - DEPLOYMENTS_BUCKET=YOUR_S3_BUCKET
? ? ? - DEPLOYMENTS_TABLE=YOUR_DYNAMODB_TABLE
? ? ? - STACK_NAME=YOUR_CF_STACK
? ? ?
? ? ?
? ? ? - echo $TRAINING_LAMBDA_DIFF
? ? ? - TRAINING_LAMBDA_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"Training-Lambda-Function"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? - TRAINING_LAMBDA_VERSION=$(echo $TRAINING_LAMBDA_VERSION | cut -d'"' -f 2)
? ? ? - echo $TRAINING_LAMBDA_VERSION
? ? ? - |
? ? ? ? if [ $TRAINING_LAMBDA_DIFF == 1 ]; then
? ? ? ? ? echo "Deploying new version of Training-Lambda-Function..."
? ? ? ? ? TRAINING_LAMBDA_NEW_VERSION=$(($TRAINING_LAMBDA_VERSION + $AUTO_INCREMENT))
? ? ? ? ? TRAINING_LAMBDA_ECR_NAME=training-lambda-function
? ? ? ? ? docker build -t $TRAINING_LAMBDA_ECR_NAME FULL_REPO_PATH/Training-Lambda-Function/.
? ? ? ? ? docker tag $TRAINING_LAMBDA_ECR_NAME $AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
? ? ? ? ? docker push $AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
? ? ? ? ? TRAINING_IMAGE_URI=$AWS_ACCOUNT.dkr.ecr.$YOUR_REGION.amazonaws.com/$ECR_REPO:$TRAINING_LAMBDA_ECR_NAME-$TRAINING_LAMBDA_NEW_VERSION
? ? ? ? else
? ? ? ? ? echo "No new version of Training-Lambda-Function..."
? ? ? ? ? TRAINING_IMAGE_URI=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --select ALL_ATTRIBUTES --key-condition-expression "Component = :name AND BuildNumber = :build" --expression-attribute-values "{\":name\":{\"S\":\"Training-Lambda-Function\"},\":build\":{\"N\":\"$TRAINING_LAMBDA_VERSION\"}}" --max-items 1 --query 'Items[*].Location.S')
? ? ? ? ? TRAINING_IMAGE_URI=$(echo $TRAINING_IMAGE_URI | cut -d'"' -f 2)
? ? ? ? ? echo $TRAINING_IMAGE_URI
? ? ? ? fi

Suppose this Lambda function had not changed. Why do we still read the current production version number (outside of the conditional statement), and why do we have an else statement to pull the current production Lambda's ECR image URI path?

Because in the post_build phase, CloudFormation deployment requires the code location of each component in --parameter-overrides, whether a given component changed or not (see code below). Therefore, we need the ability to pull each component's current version and code location, while also allowing new, individual component, modularized deployments.

Finally, if CloudFormation deployment succeeds, we write the deployment metadata to DynamoDB:

post_build
? commands:
? ? - aws cloudformation package --template-file FULL_REPO_PATH/template.yml --output-template-file template-package.yml --s3-bucket $DEPLOYMENTS_BUCKET
? ? ?
? ? - aws cloudformation deploy --template-file template-package.yml --stack-name $STACK_NAME --parameter-overrides TrainingLambdaImageUri=$TRAINING_IMAGE_URI --capabilities CAPABILITY_NAMED_IAM --no-fail-on-empty-changeset
? ? ?
? ? - |
? ? ? if [ $TRAINING_LAMBDA_DIFF == 1 ]; then
? ? ?   echo "Writing Training-Lambda-Function metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"Training-Lambda-Function\"},\"BuildNumber\":{\"N\":\"$TRAINING_LAMBDA_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"$TRAINING_IMAGE_URI\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
? ? ?
? ? - echo $CF_TEMPLATE_DIFF
? ? - |
? ? ? if [ $CF_TEMPLATE_DIFF == 1 ]; then
? ? ? ? CF_TEMPLATE_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"template.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? ? CF_TEMPLATE_VERSION=$(echo $CF_TEMPLATE_VERSION | cut -d'"' -f 2)
? ? ? ? echo $CF_TEMPLATE_VERSION
? ? ? ? CF_TEMPLATE_NEW_VERSION=$(($CF_TEMPLATE_VERSION + $AUTO_INCREMENT))
? ? ? ? echo "Writing template.yml metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"D2A.yml\"},\"BuildNumber\":{\"N\":\"$CF_TEMPLATE_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/template.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
? ? ?
? ? - echo $PROD_BUILDSPEC_DIFF
? ? - |
? ? ? if [ $PROD_BUILDSPEC_DIFF == 1 ]; then
? ? ? ? PROD_BUILDSPEC_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"prod-buildspec.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? ? PROD_BUILDSPEC_VERSION=$(echo $PROD_BUILDSPEC_VERSION | cut -d'"' -f 2)
? ? ? ? echo $PROD_BUILDSPEC_VERSION
? ? ? ? PROD_BUILDSPEC_NEW_VERSION=$(($PROD_BUILDSPEC_VERSION + $AUTO_INCREMENT))
? ? ? ? echo "Writing prod-buildspec.yml metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"prod-buildspec.yml\"},\"BuildNumber\":{\"N\":\"$PROD_BUILDSPEC_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/prod-buildspec.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi
? ? ?
? ? - echo $TEST_BUILDSPEC_DIFF
? ? - |
? ? ? if [ $TEST_BUILDSPEC_DIFF == 1 ]; then
? ? ? ? TEST_BUILDSPEC_VERSION=$(aws dynamodb query --table-name $DEPLOYMENTS_TABLE --no-scan-index-forward --projection-expression "BuildNumber" --key-condition-expression "Component = :name" --expression-attribute-values '{":name":{"S":"test-buildspec.yml"}}' --max-items 1 --query 'Items[*].BuildNumber.N')
? ? ? ? TEST_BUILDSPEC_VERSION=$(echo $TEST_BUILDSPEC_VERSION | cut -d'"' -f 2)
? ? ? ? echo $TEST_BUILDSPEC_VERSION
? ? ? ? TEST_BUILDSPEC_NEW_VERSION=$(($TEST_BUILDSPEC_VERSION + $AUTO_INCREMENT))
? ? ? ? echo "Writing test-buildspec.yml metadata to DynamoDB..."
? ? ? ? aws dynamodb put-item --table-name $DEPLOYMENTS_TABLE --item "{\"Component\":{\"S\":\"test-buildspec.yml\"},\"BuildNumber\":{\"N\":\"$TEST_BUILDSPEC_NEW_VERSION\"},\"Solution\":{\"S\":\"$STACK_NAME\"},\"Location\":{\"S\":\"FULL_REPO_PATH/test-buildspec.yml\"},\"Timestamp\":{\"S\":\"$TIMESTAMP\"}}"
? ? ? fi

If everything succeeds, we proceed to cross-account deployment into our production AWS account. See "AWS Cross-Account Deployments" for details. We will expand on this topic in future articles.

How do you deploy changes from test to prod within AWS, modularized such that only the components that changed are updated? Let us know in the comments.

Subscribe to our weekly LinkedIn newsletter:?Machine Learning In Production

Reach out if you need help:

Maximizing the business value of your data to improve core business KPIs
Deploying & monetizing your ML models in production
Building Well-Architected production ML software solutions
Implementing cloud-native MLOps
Training your teams to systematically take models from research to production
Identifying new DS/ML opportunities in your company or taking existing projects to the next level
Anything else we can help you with

Would you like me to speak at your event? Email me at [email protected]

Check out our blog at Gradient Group:?https://gradientgroup.ai/blog/

Alfonso Valdes

CEO | Tech Leader in AWS, SaaS, DevOps, Kubernetes, Terraform, Serverless and Cloud-Native Development

3 年

Great read, Carlos! Thank you for sharing this with us.

1 次回应

Carlos Lara

Principal Machine Learning Engineer | AWS

3 年

How do you deploy changes from test to prod, modularized such that only the components that changed are updated?

查看更多评论

要查看或添加评论，请登录

查看全部

Modular Deployments in AWS Cross-Account CI/CD Pipelines

Carlos Lara

Principal Machine Learning Engineer | AWS

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

CloudifyOps Mini-blog Series - Simple Scheduled Tasks for S3 using AWS Lambda Function and Amazon CloudWatch Event

Serverless AI Systems. Multicloud Orchestration

ETL encapsulation in aws-Lambda Function with Serverless, CloudFormation, APIGateway, Docker, FastAPI to PowerBI API

BigQuery Transformations pipeline automated with dbt, Airflow, Kubernetes, and GitHub Actions

Simple Go Microservice Architecture to begin your journey with Go!

Building A Serverless Post Scheduling Backend

Svitla Systems May updates

Comprehensive Deployment and Monitoring of a Machine Learning Model with Flask on AWS Using Prometheus and Grafana

Leveraging Azure Functions for Efficient Serverless Computing: A Business Perspective

Serverless ELT in Google Cloud with Dataform, Workflows, and Firestore

领英推荐

Centralized Feature Engineering With SageMaker Feature Store

2022年1月4日

Test-Driven Development For Feature Engineering Microservices

2022年1月1日

Null Imputation Bias and Fairness for Production ML Solutions

2021年12月31日

Continuous Training of Machine Learning Models in Production

2021年12月29日

Unit Testing Data Validation Microservices for Production ML Pipelines

2021年12月25日

Testing ML Microservices for Production Deployments

2021年12月19日

How To Drive Revenue Growth Through Production ML Solutions

2021年12月11日

3 Degrees of Automation for Production Machine Learning Solutions

2021年11月30日

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

2021年11月22日

5 Pillars of Architecture Design for Production ML Software Solutions

2021年11月15日

社区洞察

其他会员也浏览了

CloudifyOps Mini-blog Series - Simple Scheduled Tasks for S3 using AWS Lambda Function and Amazon CloudWatch Event

Serverless AI Systems. Multicloud Orchestration

ETL encapsulation in aws-Lambda Function with Serverless, CloudFormation, APIGateway, Docker, FastAPI to PowerBI API

BigQuery Transformations pipeline automated with dbt, Airflow, Kubernetes, and GitHub Actions

Simple Go Microservice Architecture to begin your journey with Go!

Building A Serverless Post Scheduling Backend

Svitla Systems May updates

Comprehensive Deployment and Monitoring of a Machine Learning Model with Flask on AWS Using Prometheus and Grafana

Leveraging Azure Functions for Efficient Serverless Computing: A Business Perspective

Serverless ELT in Google Cloud with Dataform, Workflows, and Firestore