登录查看更多内容

Optimizing Data Pipelines: Staging Models Best Practices and Limiting View Runs

Venkat Suryadevara

Engineering Leadership | Data Engineering, Data Governance, Data Modelling

发布日期: 2024年4月9日

Are you grappling with questions surrounding staging models and materializing views in your dbt projects? In this guide, we delve into best practices and solutions to optimize your data pipelines for efficiency and cost-effectiveness.

Structuring Projects for Success

Central to a well-organized dbt project is a structured approach to modeling. Our recommended project structure encompasses three primary layers:

Staging: Build initial modular blocks from source data.
Intermediate: Layer logic for data preparation and transformation.
Marts: Combine modular pieces into comprehensive organizational entities.

Embracing this structure, especially prioritizing a staging layer predominantly composed of views, can yield significant benefits, including reduced code duplication and storage costs.

Excluding Views in dbt Cloud Jobs

Have you wondered how to exclude unchanged views from your dbt Cloud job runs to optimize performance? Here's a solution:

Exclude views by modifying your dbt Cloud job command to include: --exclude config.materialized:view.
Consider the implications carefully, especially if your views contain dynamic code or require testing with each run.

Custom Selectors for Job Commands

To fine-tune your job runs and exclude views while still running tests, follow these steps:

Create a custom selector named skip_views_but_test_views.
Configure the selector to skip materializing views while ensuring tests for views are still executed.
Apply the selector to your dbt Cloud jobs with --selector skip_views_but_test_views.

Building Only Changed Views

For even greater optimization, set up a merge job triggered upon code merges into main:

Create a merge job in dbt Cloud, specifying commands to run only modified views.
Use the merge job to deploy changes from PRs to production promptly, optimizing platform spend and job durations.

While setting up and testing these optimizations requires effort, the benefits — including significant reductions in job runtime — justify the investment.

#DataPipelines #dbt #DataOptimization #DataManagement #DataModeling #DataWarehouse #DataAnalytics

要查看或添加评论，请登录

Venkat Suryadevara的更多文章

Spring Authorization Server: Empowering Secure OAuth 2.1 and OpenID Connect Solutions

2025年2月24日

Spring Authorization Server: Empowering Secure OAuth 2.1 and OpenID Connect Solutions

Spring Authorization Server is a powerful framework that provides robust implementations of the OAuth 2.1 and OpenID…
Spring Modulith: Revolutionizing Modular Application Development

2025年2月23日

Spring Modulith: Revolutionizing Modular Application Development

Spring Modulith is an innovative project that empowers developers to build well-structured, modular Spring Boot…
Spring for Apache Kafka: Streamlining Messaging in Enterprise Applications

2025年2月23日

Spring for Apache Kafka: Streamlining Messaging in Enterprise Applications

Spring for Apache Kafka is a powerful project that seamlessly integrates the robust messaging capabilities of Apache…
Material 3 in Angular : A New Era of Design

2025年2月23日

Material 3 in Angular : A New Era of Design

Angular has introduced experimental support for Material 3 theming in Angular Material, marking a significant milestone…
Angular Performance Optimization: Maximizing Efficiency in 2025

2025年2月23日

Angular Performance Optimization: Maximizing Efficiency in 2025

As Angular continues to evolve, performance optimization remains a critical aspect of building scalable and responsive…
Angular's RxJS Interoperability: Bridging Signals and Observables

2025年2月23日

Angular's RxJS Interoperability: Bridging Signals and Observables

Angular's latest updates have introduced a powerful set of tools for integrating RxJS Observables with Angular's new…
Angular's Standalone Components: The New Default in v19

2025年2月23日

Angular's Standalone Components: The New Default in v19

Angular v19 is set to introduce a significant change that will reshape how developers build applications: standalone…
Angular v19: Elevating Performance and Developer Experience

2025年2月23日

Angular v19: Elevating Performance and Developer Experience

Angular v19, released on November 19, 2024, brings a suite of powerful features and improvements that significantly…
Angular's Incremental Hydration: Optimizing Performance and User Experience

2025年2月23日

Angular's Incremental Hydration: Optimizing Performance and User Experience

As an Angular expert specializing in performance optimization, I'm excited to share insights on Angular's latest…

1 条评论
Angular in 2025: Embracing Innovation and Opportunity

2025年2月23日

Angular in 2025: Embracing Innovation and Opportunity

As we navigate through 2025, Angular continues to evolve, offering exciting prospects for developers and businesses…

1 条评论

See all articles

Optimizing Data Pipelines: Staging Models Best Practices and Limiting View Runs

Venkat Suryadevara

Engineering Leadership | Data Engineering, Data Governance, Data Modelling

Structuring Projects for Success

Excluding Views in dbt Cloud Jobs

Custom Selectors for Job Commands

Building Only Changed Views

Venkat Suryadevara的更多文章

社区洞察

其他会员也浏览了

Delta Live Tables Series — Part 3 — Data Lineage and Dependency Management

Why Data Mesh is the Future of Data Products and BI

Enhanced dbt Data Quality Observability at Speed

Helpful Extract & Load Practices for High-Quality Raw Data : #5/5

Unlocking Performance Engineering outcomes with realistic data !!

Avoid These Airflow Mistakes: Best Practices for Reliable Data Pipelines

Best Practices for Robust Data Pipeline Design

Delta Live Tables in DataBricks — An Introductory Overview - Part 1

DBT ZERO TO HERO

ConfigMaps in Kubernetes

Structuring Projects for Success

Excluding Views in dbt Cloud Jobs

Custom Selectors for Job Commands

Building Only Changed Views

Venkat Suryadevara的更多文章

Spring Authorization Server: Empowering Secure OAuth 2.1 and OpenID Connect Solutions

Spring Modulith: Revolutionizing Modular Application Development

Spring for Apache Kafka: Streamlining Messaging in Enterprise Applications

Material 3 in Angular : A New Era of Design

Angular Performance Optimization: Maximizing Efficiency in 2025

Angular's RxJS Interoperability: Bridging Signals and Observables

Angular's Standalone Components: The New Default in v19

Angular v19: Elevating Performance and Developer Experience

Angular's Incremental Hydration: Optimizing Performance and User Experience

Angular in 2025: Embracing Innovation and Opportunity

社区洞察

其他会员也浏览了

Delta Live Tables Series — Part 3 — Data Lineage and Dependency Management

Why Data Mesh is the Future of Data Products and BI

Enhanced dbt Data Quality Observability at Speed

Helpful Extract & Load Practices for High-Quality Raw Data : #5/5

Unlocking Performance Engineering outcomes with realistic data !!

Avoid These Airflow Mistakes: Best Practices for Reliable Data Pipelines

Best Practices for Robust Data Pipeline Design

Delta Live Tables in DataBricks — An Introductory Overview - Part 1

DBT ZERO TO HERO

ConfigMaps in Kubernetes