How to handle platform-breaking change(GitLab CI_JOB_JWT)

How to handle platform-breaking change(GitLab CI_JOB_JWT)

In the whirlwind world of platform engineering, where every day is a battle against time and complexity, the arrival of breaking changes adds an extra layer of chaos to an already bustling schedule.


There was a breaking change that Gitlab warned us about a while back, but we were preoccupied with more captivating and innovative projects. One of these was constructing a platform command centre, empowering end-users to self-service organisational resources (GCP, ENV0, Vault, GitLab, and Okta) with secure defaults.

Table of contents

  • Streamlining Cloud Deployments with HashiCorp Vault and GitLab CI/CD
  • Build confidence step by step
  • Tell them about the changes and the responsibilities
  • A clear rollout plan to eliminate any confusion
  • Here's the technical implementation
  • Bio


Streamlining Cloud Deployments with HashiCorp Vault and GitLab CI/CD

To eradicate the need for manual management of secrets during the deployment of infrastructure and applications to the cloud, we rely on HashiCorp Vault Enterprise as our secret management solution. Vault facilitates the provision of both ephemeral credentials (through the GCP secret engine) and static secrets (via the KVV2 engine) to our GitLab CI/CD pipelines, culminating in seamless deployments to the cloud. From the outset, we've utilised CI_JOB_JWT extensively, but we're now nearing the point where the warning message below demands our urgent attention.

GitLab 17.0 release date is on 16th May 2024. This leaves us with just over three months to address this issue

Build confidence step by step

Our sole objective with this breaking change is to achieve minimal impact on our developer experience. Let's avoid alarming end users until we're confident in our approach.

  1. Evidence-based workflow. Set up a page to document everything about this change to take our end user on a journey.
  2. Learn about the resources. Comprehend all related components impacted by this change, such as Vault (auth methods and backend roles) and GitLab (CI/CD pipelines)
  3. Discovery becomes effortless when everything is managed as code. Determine where potential changes are needed. Since our processes are code-driven, leveraging GitLab's robust search function at the group level can efficiently narrow down from 9000 GitLab projects to around 600.
  4. GET HELP, you are not alone! Gather my colleagues, GitLab-supplied methods, and Hashicorp solutions engineer(Thanks Guy Barros )
  5. Find an option with the least effort required from the end users. In this scenario, opting for an in-place update using Method B is more practical than generating new resources with Method A. Method B requires some revised implementation steps due to our complexity and scale.
  6. Test, test and test. Develop both positive and negative scenarios to explore every possibility with evidence. Testing in lower environments proved to be incredibly beneficial.


After testing, we've concluded that Method B will have less impact on our end users. Additionally, we'll prioritize step b to give our end users more time to test with the new VAULT_ID_TOKEN.

Tell them about the changes and the responsibilities

CI_JOB_JWT validates with auth method and backend role
VAULT_ID_TOKEN validate with backend role
Clear responsilbities


A clear rollout plan to eliminate any confusion

In the preceding section, we divided the changes into five distinct changes, with only one change relevant to end users. (5 lines of code changes required for their custom CI/CD pipelines)

20240313 - Vault admins - COMMS - Send out the meeting invite.

20240314 - Vault admins - COMMS - Create a new Teams channel for collaboration.

20240319 - Vault admins - COMMS - Meeting to provide an overview of this GitLab breaking change. Outlining the necessary actions for Vault admins and end users, including providing example codes and search queries.

20240321 - Vault admins - CHANGE_1 - Implement changes to the JWT authentication method (vault_jwt_auth_backend) across all namespaces, ensuring changes are validated. This allows CI_JOB_JWT and VAULT_ID_TOKEN to be working at the same time.

20240322 - Vault admins - CHANGE_2, CHANGE_3, CHANGE_4. Including templates, existing backend roles and pre-defined pipeline changes.

20240322 - Department admins/Vault users - CHANGE_5 - Start testing out pipeline changes required and switch to VAULT_ID_TOKEN from CI_JOB_JWT.

2024 Mid-April - Department admins/Vault users/Vault admins -Regroup and review changes.

20240516 - GitLab - Farewell to CI_JOB_JWT??

20240516 - Chilled users - Broken pipelines??


Here's the technical implementation

Fortunately, we've centralised Terraform templates for Vault resources across approximately 60 GitLab projects, encompassing around 200 changes

CHANGE_1 - Remove bound_issuer from "vault_jwt_auth_backend" Terraform resource.

CHANGE_2 - Add iss bound_claims to all "vault_jwt_auth_backend_role" Terraform resource. This is to cover all new trigger deployments.

CHANGE_3 - Add iss bound_claims to all existing backend roles using Vault cli.

The below example .gitlab-ci.yml will create a dynamic pipeline generator job to create the below jobs for each Vault namespace that you have in Vault.

  • Backup
  • Implementation
  • Rollback

## Your GitLab pipeline image must have Vault CLI and JQ installed.
## I am using existing JWT auth method to get my VAULT_TOKEN.
 
stages:
  - pre
  - apply

variables:
  VAULT_ADDR: "<YOUR_VAULT_ADDR>"

.query_vaultnamespaces: &query_vaultnamespaces
  - | ## Construct VAULT_NAMESPACES - All available Vault namespaces
    echo root >> VAULT_NAMESPACES
    for item in $(vault namespace list -format=json | jq -r -c '.[]'); do
        namespace="${item%?}"
        vault_roles=""
        vault_roles=$(vault list -namespace=$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
        if [ -z "${vault_roles}" ]; then
            echo "No role available in $namespace namespace."
        else
            echo "Role(s) detected in $namespace namespace."
            echo $namespace >> VAULT_NAMESPACES
        fi
    done
    echo -e "\e[36mVault namespaces available \e[0m"
    cat VAULT_NAMESPACES

.get_vault_token: &get_vault_token
  - |
    echo "VAULT_ADDR=$VAULT_ADDR"
    if [ ! -z "$VAULT_TOKEN" ]
    then
      echo "VAULT_TOKEN is present."
      export VAULT_TOKEN=$VAULT_TOKEN
    else
      echo "VAULT_TOKEN not present. Using GitLab JWT."
      if [ "$VAULT_CONFIGS_ENV" == "prd" ]; then
        echo "ENVIRONMENT=$ENVIRONMENT"
        export VAULT_TOKEN=$(vault write -namespace=root -field=token auth/jwt_vault_id_token/login role=vault-orchestration-role jwt=$VAULT_ID_TOKEN)
      else 
        export VAULT_TOKEN=$(vault write -namespace=root -field=token auth/jwt/login role=vault-orchestration-role jwt=$VAULT_ID_TOKEN)
      fi
    fi
    echo "Getting VAULT_TOKEN"
    [ -z "$VAULT_TOKEN" ] && echo "Mandatory variable - VAULT_TOKEN is empty." && exit 1|| echo "Retrieved VAULT_TOKEN"
    
generator_vault_jwt_roles:
  stage: pre
  tags: 
    - <YOUR_RUNNER>
  before_script:
    - *get_vault_token
  script:
    - *query_vaultnamespaces
    - timestamp=$(date +"%Y%m%d%H%M%S")
    - | ## Generate base template for downstream pipelines
      cat >> GENERATED_PIPELINE.yml << EOF
      image: <YOUR_RUNNER_IMAGE>

      stages:
        - pre
        - apply
        - restore

      variables: ##Overcome timeout issues
        VAULT_RATE_LIMIT: "1"
        VAULT_CLIENT_TIMEOUT: "120"
        VAULT_MAX_RETRIES: "25"

      default:
        id_tokens:
          VAULT_ID_TOKEN:
            aud: https://gitlab.com

      .get_vault_token: &get_vault_token
        - |
          echo "VAULT_ADDR=\$VAULT_ADDR"
          if [ ! -z "\$VAULT_TOKEN" ]
          then
            echo "VAULT_TOKEN is present."
            export VAULT_TOKEN=\$VAULT_TOKEN
          else
            echo "VAULT_TOKEN not present. Using GitLab JWT."
            if [ "\$VAULT_CONFIGS_ENV" == "prd" ]; then
              echo "ENVIRONMENT=\$ENVIRONMENT"
              export VAULT_TOKEN=\$(vault write -namespace=root -field=token auth/jwt_vault_id_token/login role=vault-orchestration-role jwt=\$VAULT_ID_TOKEN)
            else 
              export VAULT_TOKEN=\$(vault write -namespace=root -field=token auth/jwt/login role=vault-orchestration-role jwt=\$VAULT_ID_TOKEN)
            fi
          fi
          echo "Getting VAULT_TOKEN"
          [ -z "\$VAULT_TOKEN" ] && echo "Mandatory variable - VAULT_TOKEN is empty." && exit 1|| echo "Retrieved VAULT_TOKEN"

      .backup_roles_namespace: &backup_roles_namespace
        - | ## Backup all roles with jwt auth method
          echo -e "\e[34mStarting the backup process \e[0m"
          echo \$timestamp
          echo -e "\e[36mExecuting for \$namespace namespace \e[0m"
          ## List jwt auth method roles within a namespace
          vault_roles=\$(vault list -namespace=\$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
          cat >> backup-\$namespace-\$timestamp.json << EOF
          [
          EOF
          ## Query bound_claims for each role and start backup
          for role in \$vault_roles; do
              echo \$role
              existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')

          cat >> backup-\$namespace-\$timestamp.json << EOF
          { "role" : "\$role",
            "bound_claims" : \$existing_bound_claims },
          EOF
          done
          sed -i '\$ s/.\$//' backup-\$namespace-\$timestamp.json
          cat >> backup-\$namespace-\$timestamp.json << EOF
          ]
          EOF
          roles=\$(jq length backup-\$namespace-\$timestamp.json)
          echo -e "\e[33mTotal of \$roles roles to be backup for \$namespace namespace\e[0m"
          cat backup-\$namespace-\$timestamp.json | jq '.'

      .restore_roles_namespace: &restore_roles_namespace
        - | ## Restore all roles with jwt auth method
          for obj in \$(cat backup-\$namespace-\$timestamp.json | jq -c '.[]'); do
              role=""
              bound_claims=""
              role=\$(echo \$obj | jq -r '.role')
              bound_claims=\$(echo \$obj | jq -r '.bound_claims')

          vault write -namespace=\$namespace auth/jwt/role/\$role -<<EOF
          {
            "role_type": "jwt",
            "bound_claims_type": "glob",
            "bound_claims": \$bound_claims
          }
          EOF
          done
          ## List jwt auth method roles within a namespace
          vault_roles=\$(vault list -namespace=\$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
          cat >> check.json << EOF
          [
          EOF
          ## Query bound_claims for each role and start backup
          for role in \$vault_roles; do
              echo \$role
              existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')

          cat >> check.json << EOF
          { "role" : "\$role",
            "bound_claims" : \$existing_bound_claims },
          EOF
          done
          sed -i '\$ s/.\$//' check.json
          cat >> check.json << EOF
          ]
          EOF
          cat check.json | jq '.'

      .change_roles_namespace: &change_roles_namespace
        - | ## update all roles with jwt auth method
          ROLES=\$(cat backup-\$namespace-\$timestamp.json | jq -r '.[].role')
          for role in \$ROLES; do
              echo \$role
              existing_bound_claims=""
              updated_bound_claims=""
              existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')
              updated_bound_claims=\$(echo "\$existing_bound_claims" | jq '.iss = ["gitlab.com","https://gitlab.com"]')
          vault write -namespace=\$namespace auth/jwt/role/\$role -<<EOF
          {
            "role_type": "jwt",
            "bound_claims_type": "glob",
            "bound_claims": \$updated_bound_claims
          }
          EOF
          done
          ## List jwt auth method roles within a namespace
          vault_roles=\$(vault list -namespace=\$namespace -format=json auth/jwt/role | jq -r '.[]' || true)
          cat >> check.json << EOF
          [
          EOF
          ## Query bound_claims for each role and start backup
          for role in \$vault_roles; do
              echo \$role
              existing_bound_claims=\$(vault read -namespace=\$namespace -format=json auth/jwt/role/\$role | jq -r '.data.bound_claims')

          cat >> check.json << EOF
          { "role" : "\$role",
            "bound_claims" : \$existing_bound_claims },
          EOF
          done
          sed -i '\$ s/.\$//' check.json
          cat >> check.json << EOF
          ]
          EOF
          cat check.json | jq '.'
      EOF
    - | ## Create downstream pipelines
      while read p; do
      cat >> GENERATED_PIPELINE.yml << EOF
      $p-namespace-roles-backup:
        variables:
          namespace: $p
          timestamp: $timestamp
        stage: pre
        tags: 
          - <YOUR_RUNNER>
        before_script:
          - *get_vault_token
        script:
          - echo -e "\e[33mPerforming backup to current namespace is \$namespace\e[0m"
          - *backup_roles_namespace
        when: manual
        artifacts:
          paths:
            - backup-*.json

      $p-namespace-roles-restore:
        needs:
          - $p-namespace-roles-backup
        variables:
          namespace: $p
          timestamp: $timestamp
        stage: restore
        tags: 
          - <YOUR_RUNNER>
        before_script:
          - *get_vault_token
        script:
          - echo -e "\e[33mPerforming restore to current namespace is \$namespace\e[0m"
          - *restore_roles_namespace
        when: manual
        artifacts:
          paths:
            - check.json

      $p-namespace-roles-change:
        needs:
          - $p-namespace-roles-backup
        variables:
          namespace: $p
          timestamp: $timestamp
        stage: apply
        tags: 
          - <YOUR_RUNNER>
        before_script:
          - *get_vault_token
        script:
          - echo -e "\e[33mPerforming iss change to current namespace is \$namespace\e[0m"
          - *change_roles_namespace
        when: manual
        artifacts:
          paths:
            - check.json
      EOF
      done <VAULT_NAMESPACES
  when: manual
  artifacts:
    expire_in: 1 day
    paths:
      - GENERATED_PIPELINE.yml
  
trigger_vault_jwt_roles:
  needs:
    - generator_vault_jwt_roles
  stage: apply
  trigger:
    include:
      - artifact: GENERATED_PIPELINE.yml
        job: generator_vault_jwt_roles
    strategy: depend
    forward:
      pipeline_variables: true
      yaml_variables: true        


CHANGE_4 - Update all pre-defined Gitlab CI/CD pipelines(Same code change)

CHANGE_5 - Update all custom Gitlab CI/CD pipelines(Same code change)

## 1 - Mandatory - Add VAULT_ID_TOKEN. Using default is easier to apply to all jobs in the same pipeline
default:
  id_tokens:
    VAULT_ID_TOKEN:
      aud: https://gitlab.com
      
## 2 - Optional if you are using the GCP secret engine - Update Vault CLI to use VAULT_ID_TOKEN instead CI_JWT_TOKEN
vault write -field=token auth/jwt/login role=$VAULT_BACKEND_ROLE jwt=$VAULT_ID_TOKEN        





Bio

Henry Tze serves as the Principal Cloud Security Engineer at Virgin Media O2, specialising in crafting a user-centric security infrastructure on a large scale for developers, engineers, and builders. His primary focus lies in facilitating value creation at a rapid pace within AWS and GCP Cloud environments.

His approach involves empowering users across all levels by furnishing them with pipeline templates, infrastructure blueprints, practical examples, secure operational practices, and user-friendly, low/no code self-service platforms that surpass conventional expectations.

Employing a philosophy of "everything as code," Henry believes in fostering unity and alignment among builders. He advocates for the formation of an internal community among users to tackle technical challenges collectively and foster collaboration.




Collen Kriel

Senior Customer Success Manager at GitLab

11 个月

Excellent write-up, Henry. Love that you published this. I'm going to share it far and wide with my colleagues in the hope that it will help other customers.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了