Setting Up dbt Core on GCP: A Step-by-Step Guide
Sunil Rastogi
AWS/GCP Solutions Architect||Data Engineer||Python||Scala||Spark||Big Data||Snowflake||Freelancer
Deploying dbt Core on Google Cloud Platform (GCP) allows you to centralize and scale your data transformation workflows without relying on local environments. This guide will walk you through the steps to set up dbt Core directly on GCP, ensuring a cloud-native deployment that integrates seamlessly with GCP services.
1. What is dbt Core?
dbt Core (Data Build Tool) is an open-source tool designed to help analytics engineers build, test, and deploy data transformations using modular SQL. By leveraging dbt Core, teams can apply software engineering principles such as version control, testing, and modularity to data workflows.
Key Features:
2. Benefits of dbt Core
Setting up dbt Core on GCP offers several advantages:
3. GCP Services Required to Set Up dbt Core
To deploy dbt Core on GCP, you will need the following services:
4. Steps to Set Up dbt Core on GCP
Step 1: Prepare GCP Environment
1. BigQuery API
2. Compute Engine API
3. Cloud Storage API
1. BigQuery Data Editor: To execute queries and transformations.
2. Storage Object Viewer: To read project files from Cloud Storage.
3. Storage Object Creator: To write logs or results.
Step 2: Deploy dbt Core Using Compute Engine
领英推荐
1. Machine Type: e2-medium or higher (depending on workload).
2. OS: Ubuntu 20.04 LTS.
# Update and install required packages
sudo apt update && sudo apt install -y python3-pip python3-venv git
# Create a virtual environment
python3 -m venv dbt-env
source dbt-env/bin/activate
# Install dbt with BigQuery adapter
pip install dbt-bigquery
scp path/to/service-account-key.json username@your-vm-ip:/path/to/destination/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json
Step 3: Configure dbt Project
dbt init my_dbt_project
cd my_dbt_project
bigquery_project:
outputs:
prod:
type: bigquery
method: service-account
project: your-project-id
dataset: your-dataset
keyfile: /path/to/service-account-key.json
threads: 4
target: prod
dbt debug
Step 4: Automate dbt Core with Cloud Scheduler and Cloud Run
FROM python:3.9-slim
WORKDIR /app
# Install dbt and dependencies
RUN pip install dbt-bigquery
# Copy dbt project files
COPY . /app
CMD ["dbt", "run"]
Build and push the Docker image to Container Registry:
gcloud builds submit --tag gcr.io/your-project-id/dbt-core
Deploy the image to Cloud Run:
gcloud run deploy dbt-core \
--image gcr.io/your-project-id/dbt-core \
--platform managed \
--region us-central1 \
--allow-unauthenticated
Conclusion
Setting up dbt Core on GCP allows you to leverage the power of cloud-native tools for data transformation. By integrating with BigQuery and automating workflows using Cloud Run and Cloud Scheduler, you can create a scalable and efficient data transformation pipeline.