How to handle platform-breaking change(GitLab CI_JOB_JWT)
In the whirlwind world of platform engineering, where every day is a battle against time and complexity, the arrival of breaking changes adds an extra layer of chaos to an already bustling schedule.
There was a breaking change that Gitlab warned us about a while back, but we were preoccupied with more captivating and innovative projects. One of these was constructing a platform command centre, empowering end-users to self-service organisational resources (GCP, ENV0, Vault, GitLab, and Okta) with secure defaults.
Table of contents
Streamlining Cloud Deployments with HashiCorp Vault and GitLab CI/CD
To eradicate the need for manual management of secrets during the deployment of infrastructure and applications to the cloud, we rely on HashiCorp Vault Enterprise as our secret management solution. Vault facilitates the provision of both ephemeral credentials (through the GCP secret engine) and static secrets (via the KVV2 engine) to our GitLab CI/CD pipelines, culminating in seamless deployments to the cloud. From the outset, we've utilised CI_JOB_JWT extensively, but we're now nearing the point where the warning message below demands our urgent attention.
Build confidence step by step
Our sole objective with this breaking change is to achieve minimal impact on our developer experience. Let's avoid alarming end users until we're confident in our approach.
After testing, we've concluded that Method B will have less impact on our end users. Additionally, we'll prioritize step b to give our end users more time to test with the new VAULT_ID_TOKEN.
Tell them about the changes and the responsibilities
A clear rollout plan to eliminate any confusion
In the preceding section, we divided the changes into five distinct changes, with only one change relevant to end users. (5 lines of code changes required for their custom CI/CD pipelines)
20240313 - Vault admins - COMMS - Send out the meeting invite.
20240314 - Vault admins - COMMS - Create a new Teams channel for collaboration.
20240319 - Vault admins - COMMS - Meeting to provide an overview of this GitLab breaking change. Outlining the necessary actions for Vault admins and end users, including providing example codes and search queries.
20240321 - Vault admins - CHANGE_1 - Implement changes to the JWT authentication method (vault_jwt_auth_backend) across all namespaces, ensuring changes are validated. This allows CI_JOB_JWT and VAULT_ID_TOKEN to be working at the same time.
20240322 - Vault admins - CHANGE_2, CHANGE_3, CHANGE_4. Including templates, existing backend roles and pre-defined pipeline changes.
20240322 - Department admins/Vault users - CHANGE_5 - Start testing out pipeline changes required and switch to VAULT_ID_TOKEN from CI_JOB_JWT.
2024 Mid-April - Department admins/Vault users/Vault admins -Regroup and review changes.
20240516 - GitLab - Farewell to CI_JOB_JWT??
领英推荐
20240516 - Chilled users - Broken pipelines??
Here's the technical implementation
Fortunately, we've centralised Terraform templates for Vault resources across approximately 60 GitLab projects, encompassing around 200 changes
CHANGE_1 - Remove bound_issuer from "vault_jwt_auth_backend" Terraform resource.
CHANGE_2 - Add iss bound_claims to all "vault_jwt_auth_backend_role" Terraform resource. This is to cover all new trigger deployments.
CHANGE_3 - Add iss bound_claims to all existing backend roles using Vault cli.
The below example .gitlab-ci.yml will create a dynamic pipeline generator job to create the below jobs for each Vault namespace that you have in Vault.
## Your GitLab pipeline image must have Vault CLI and JQ installed.
## I am using existing JWT auth method to get my VAULT_TOKEN.
stages:
- pre
- apply
variables:
VAULT_ADDR: "<YOUR_VAULT_ADDR>"
.query_vaultnamespaces: &query_vaultnamespaces
- | ## Construct VAULT_NAMESPACES - All available Vault namespaces
echo root >> VAULT_NAMESPACES
for item in $(vault namespace list -format=json | jq -r -c '.[]'); do
namespace="${item%?}"
vault_roles=""
vault_roles=$(vault list -namespace=$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
if [ -z "${vault_roles}" ]; then
echo "No role available in $namespace namespace."
else
echo "Role(s) detected in $namespace namespace."
echo $namespace >> VAULT_NAMESPACES
fi
done
echo -e "\e[36mVault namespaces available \e[0m"
cat VAULT_NAMESPACES
.get_vault_token: &get_vault_token
- |
echo "VAULT_ADDR=$VAULT_ADDR"
if [ ! -z "$VAULT_TOKEN" ]
then
echo "VAULT_TOKEN is present."
export VAULT_TOKEN=$VAULT_TOKEN
else
echo "VAULT_TOKEN not present. Using GitLab JWT."
if [ "$VAULT_CONFIGS_ENV" == "prd" ]; then
echo "ENVIRONMENT=$ENVIRONMENT"
export VAULT_TOKEN=$(vault write -namespace=root -field=token auth/jwt_vault_id_token/login role=vault-orchestration-role jwt=$VAULT_ID_TOKEN)
else
export VAULT_TOKEN=$(vault write -namespace=root -field=token auth/jwt/login role=vault-orchestration-role jwt=$VAULT_ID_TOKEN)
fi
fi
echo "Getting VAULT_TOKEN"
[ -z "$VAULT_TOKEN" ] && echo "Mandatory variable - VAULT_TOKEN is empty." && exit 1|| echo "Retrieved VAULT_TOKEN"
generator_vault_jwt_roles:
stage: pre
tags:
- <YOUR_RUNNER>
before_script:
- *get_vault_token
script:
- *query_vaultnamespaces
- timestamp=$(date +"%Y%m%d%H%M%S")
- | ## Generate base template for downstream pipelines
cat >> GENERATED_PIPELINE.yml << EOF
image: <YOUR_RUNNER_IMAGE>
stages:
- pre
- apply
- restore
variables: ##Overcome timeout issues
VAULT_RATE_LIMIT: "1"
VAULT_CLIENT_TIMEOUT: "120"
VAULT_MAX_RETRIES: "25"
default:
id_tokens:
VAULT_ID_TOKEN:
aud: https://gitlab.com
.get_vault_token: &get_vault_token
- |
echo "VAULT_ADDR=\$VAULT_ADDR"
if [ ! -z "\$VAULT_TOKEN" ]
then
echo "VAULT_TOKEN is present."
export VAULT_TOKEN=\$VAULT_TOKEN
else
echo "VAULT_TOKEN not present. Using GitLab JWT."
if [ "\$VAULT_CONFIGS_ENV" == "prd" ]; then
echo "ENVIRONMENT=\$ENVIRONMENT"
export VAULT_TOKEN=\$(vault write -namespace=root -field=token auth/jwt_vault_id_token/login role=vault-orchestration-role jwt=\$VAULT_ID_TOKEN)
else
export VAULT_TOKEN=\$(vault write -namespace=root -field=token auth/jwt/login role=vault-orchestration-role jwt=\$VAULT_ID_TOKEN)
fi
fi
echo "Getting VAULT_TOKEN"
[ -z "\$VAULT_TOKEN" ] && echo "Mandatory variable - VAULT_TOKEN is empty." && exit 1|| echo "Retrieved VAULT_TOKEN"
.backup_roles_namespace: &backup_roles_namespace
- | ## Backup all roles with jwt auth method
echo -e "\e[34mStarting the backup process \e[0m"
echo \$timestamp
echo -e "\e[36mExecuting for \$namespace namespace \e[0m"
## List jwt auth method roles within a namespace
vault_roles=\$(vault list -namespace=\$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
cat >> backup-\$namespace-\$timestamp.json << EOF
[
EOF
## Query bound_claims for each role and start backup
for role in \$vault_roles; do
echo \$role
existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')
cat >> backup-\$namespace-\$timestamp.json << EOF
{ "role" : "\$role",
"bound_claims" : \$existing_bound_claims },
EOF
done
sed -i '\$ s/.\$//' backup-\$namespace-\$timestamp.json
cat >> backup-\$namespace-\$timestamp.json << EOF
]
EOF
roles=\$(jq length backup-\$namespace-\$timestamp.json)
echo -e "\e[33mTotal of \$roles roles to be backup for \$namespace namespace\e[0m"
cat backup-\$namespace-\$timestamp.json | jq '.'
.restore_roles_namespace: &restore_roles_namespace
- | ## Restore all roles with jwt auth method
for obj in \$(cat backup-\$namespace-\$timestamp.json | jq -c '.[]'); do
role=""
bound_claims=""
role=\$(echo \$obj | jq -r '.role')
bound_claims=\$(echo \$obj | jq -r '.bound_claims')
vault write -namespace=\$namespace auth/jwt/role/\$role -<<EOF
{
"role_type": "jwt",
"bound_claims_type": "glob",
"bound_claims": \$bound_claims
}
EOF
done
## List jwt auth method roles within a namespace
vault_roles=\$(vault list -namespace=\$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
cat >> check.json << EOF
[
EOF
## Query bound_claims for each role and start backup
for role in \$vault_roles; do
echo \$role
existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')
cat >> check.json << EOF
{ "role" : "\$role",
"bound_claims" : \$existing_bound_claims },
EOF
done
sed -i '\$ s/.\$//' check.json
cat >> check.json << EOF
]
EOF
cat check.json | jq '.'
.change_roles_namespace: &change_roles_namespace
- | ## update all roles with jwt auth method
ROLES=\$(cat backup-\$namespace-\$timestamp.json | jq -r '.[].role')
for role in \$ROLES; do
echo \$role
existing_bound_claims=""
updated_bound_claims=""
existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')
updated_bound_claims=\$(echo "\$existing_bound_claims" | jq '.iss = ["gitlab.com","https://gitlab.com"]')
vault write -namespace=\$namespace auth/jwt/role/\$role -<<EOF
{
"role_type": "jwt",
"bound_claims_type": "glob",
"bound_claims": \$updated_bound_claims
}
EOF
done
## List jwt auth method roles within a namespace
vault_roles=\$(vault list -namespace=\$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
cat >> check.json << EOF
[
EOF
## Query bound_claims for each role and start backup
for role in \$vault_roles; do
echo \$role
existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')
cat >> check.json << EOF
{ "role" : "\$role",
"bound_claims" : \$existing_bound_claims },
EOF
done
sed -i '\$ s/.\$//' check.json
cat >> check.json << EOF
]
EOF
cat check.json | jq '.'
EOF
- | ## Create downstream pipelines
while read p; do
cat >> GENERATED_PIPELINE.yml << EOF
$p-namespace-roles-backup:
variables:
namespace: $p
timestamp: $timestamp
stage: pre
tags:
- <YOUR_RUNNER>
before_script:
- *get_vault_token
script:
- echo -e "\e[33mPerforming backup to current namespace is \$namespace\e[0m"
- *backup_roles_namespace
when: manual
artifacts:
paths:
- backup-*.json
$p-namespace-roles-restore:
needs:
- $p-namespace-roles-backup
variables:
namespace: $p
timestamp: $timestamp
stage: restore
tags:
- <YOUR_RUNNER>
before_script:
- *get_vault_token
script:
- echo -e "\e[33mPerforming restore to current namespace is \$namespace\e[0m"
- *restore_roles_namespace
when: manual
artifacts:
paths:
- check.json
$p-namespace-roles-change:
needs:
- $p-namespace-roles-backup
variables:
namespace: $p
timestamp: $timestamp
stage: apply
tags:
- <YOUR_RUNNER>
before_script:
- *get_vault_token
script:
- echo -e "\e[33mPerforming iss change to current namespace is \$namespace\e[0m"
- *change_roles_namespace
when: manual
artifacts:
paths:
- check.json
EOF
done <VAULT_NAMESPACES
when: manual
artifacts:
expire_in: 1 day
paths:
- GENERATED_PIPELINE.yml
trigger_vault_jwt_roles:
needs:
- generator_vault_jwt_roles
stage: apply
trigger:
include:
- artifact: GENERATED_PIPELINE.yml
job: generator_vault_jwt_roles
strategy: depend
forward:
pipeline_variables: true
yaml_variables: true
CHANGE_4 - Update all pre-defined Gitlab CI/CD pipelines(Same code change)
CHANGE_5 - Update all custom Gitlab CI/CD pipelines(Same code change)
## 1 - Mandatory - Add VAULT_ID_TOKEN. Using default is easier to apply to all jobs in the same pipeline
default:
id_tokens:
VAULT_ID_TOKEN:
aud: https://gitlab.com
## 2 - Optional if you are using the GCP secret engine - Update Vault CLI to use VAULT_ID_TOKEN instead CI_JWT_TOKEN
vault write -field=token auth/jwt/login role=$VAULT_BACKEND_ROLE jwt=$VAULT_ID_TOKEN
Bio
Henry Tze serves as the Principal Cloud Security Engineer at Virgin Media O2, specialising in crafting a user-centric security infrastructure on a large scale for developers, engineers, and builders. His primary focus lies in facilitating value creation at a rapid pace within AWS and GCP Cloud environments.
His approach involves empowering users across all levels by furnishing them with pipeline templates, infrastructure blueprints, practical examples, secure operational practices, and user-friendly, low/no code self-service platforms that surpass conventional expectations.
Employing a philosophy of "everything as code," Henry believes in fostering unity and alignment among builders. He advocates for the formation of an internal community among users to tackle technical challenges collectively and foster collaboration.
Senior Customer Success Manager at GitLab
11 个月Excellent write-up, Henry. Love that you published this. I'm going to share it far and wide with my colleagues in the hope that it will help other customers.