Day 18: Using dbt with GitLab CI/CD Pipeline
In today’s article, we’ll delve into integrating dbt with GitLab's CI/CD pipeline, a crucial step for automating dbt workflows and ensuring that your data models go through a robust development lifecycle. We’ll cover setting up a directory structure for dbt and GitLab CI, defining environment variables for different stages (test, build, deploy), and walking through an example of a CI/CD configuration file. Let's get started!
Directory Structure
A well-organized directory structure ensures that your dbt project and GitLab CI/CD pipelines work seamlessly together. Here’s an example structure that keeps everything tidy:
#bash
├── .gitlab-ci.yml # CI/CD configuration file
├── dbt_project.yml # dbt project configuration
├── profiles.yml # dbt connection profiles (excluded from version control)
├── models # Folder for dbt models
│ ├── staging # Staging models
│ └── marts # Business logic models
├── seeds # Static data
├── snapshots # For snapshots
├── tests # For dbt tests
└── ci-scripts # Custom scripts for CI/CD if needed
Setting Up GitLab CI/CD Pipeline
1. Defining Environment Variables for Different Stages
Environment variables are critical for securely passing credentials, dbt profiles, and configurations into the CI/CD pipeline. GitLab CI allows defining variables at the project or group level, which are then available during pipeline execution.
2. The .gitlab-ci.yml File
The .gitlab-ci.yml file orchestrates the CI/CD pipeline by defining stages (e.g., test, build, deploy) and the jobs within each stage.
Here’s a basic example for dbt:
#yaml file:
stages:
- test
- build
- deploy
# Test Stage: Run dbt tests on the dev environment
test:
stage: test
image: fivetran/dbt:latest
script:
- export DBT_ENV="dev"
- dbt deps # Install dependencies
- dbt seed --profiles-dir $DBT_PROFILES_DIR --target $DBT_TARGET
- dbt run --profiles-dir $DBT_PROFILES_DIR --target $DBT_TARGET
- dbt test --profiles-dir $DBT_PROFILES_DIR --target $DBT_TARGET
only:
- merge_requests # Run tests only on MRs
variables:
DBT_PROFILES_DIR: $CI_PROJECT_DIR/profiles # Path to the dbt profiles directory
DBT_TARGET: "dev" # Run against the dev environment
# Build Stage: Build models in the staging environment
build:
stage: build
image: fivetran/dbt:latest
script:
- export DBT_ENV="staging"
- dbt run --profiles-dir $DBT_PROFILES_DIR --target $DBT_TARGET
only:
- develop # Run on the develop branch
variables:
DBT_TARGET: "staging"
# Deploy Stage: Deploy models in the production environment
deploy:
stage: deploy
image: fivetran/dbt:latest
script:
- export DBT_ENV="prod"
- dbt run --profiles-dir $DBT_PROFILES_DIR --target $DBT_TARGET
only:
- main # Run only on the main branch
variables:
DBT_TARGET: "prod"
领英推荐
3. Explanation of the .gitlab-ci.yml Components
4. Setting Up Different Environments
In dbt, environments are defined by the profiles.yml file. This file typically contains configurations for dev, staging, and production environments. Here’s an example:
#yaml:
my_project:
outputs:
dev:
type: postgres
host: dev-database-host
user: db_user
password: db_password
dbname: my_database
schema: dev_schema
staging:
type: postgres
host: staging-database-host
user: db_user
password: db_password
dbname: my_database
schema: staging_schema
prod:
type: postgres
host: prod-database-host
user: db_user
password: db_password
dbname: my_database
schema: prod_schema
target: dev # Default target
In GitLab CI, we can control which target is used (e.g., dev, staging, prod) by passing the DBT_TARGET variable. The pipeline stages use this to switch environments.
Example Workflow
Best Practices
Conclusion
Setting up dbt with GitLab CI/CD enables automated, consistent, and reliable deployment of your data models. By structuring your directory properly, defining environment variables, and setting up stages in .gitlab-ci.yml, you can manage the lifecycle of dbt models effectively across dev, staging, and production environments.
With this pipeline in place, your team can confidently iterate on data models while maintaining high standards for testing and deployment.
#dbt #gitlab #cicd #DataEngineering