ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Delta Live Tables: Declarative vs. Procedural Approaches in Databricks

Venkat Suryadevara

Engineering Leadership | Data Engineering, Data Governance, Data Modelling

å‘å¸ƒæ—¥æœŸ: 2024å¹´12æœˆ2æ—¥

When working with Delta Live Tables (DLT) in Databricks, you have the flexibility to define your data pipelines using either a declarative or procedural approach. Understanding the differences between these methods is crucial for optimizing your data engineering processes.

### 1. Declarative Approach with Delta Live Tables (DLT)

The declarative approach utilizes DLTâ€™s SQL or Python DSL to specify what the output should look like, allowing the DLT engine to manage execution, optimization, and dependencies automatically.

Example: Declarative Approach (DLT SQL DSL)

```sql

CREATE LIVE TABLE transformed_data AS

SELECT

id,

name,

UPPER(name) AS uppercase_name,

date_of_birth

FROM LIVE.raw_data;

```

- This example creates a live table that converts the name field to uppercase from the raw_data table. The DLT framework handles scheduling and maintaining the output state.

Example: Declarative Approach (DLT Python DSL)

```python

from pyspark.sql.functions import upper

import dlt

@dlt.table

def transformed_data():

df = dlt.read("raw_data")

return df.select("id", "name", upper("name").alias("uppercase_name"), "date_of_birth")

```

- Here, the decorator @dlt.table signifies that this function defines a Delta Live Table. DLT automatically manages dependencies and updates output based on changes in raw_data.

### When to Use Declarative:

- Simple transformations (e.g., filtering, renaming).

é¢†è‹±æŽ¨è

Mastering the Technical Stacks: A Guide for Data & Analytics Professionals

Mastering the Technical Stacks: A Guide for Data &â€¦

Douglas Robertson 1 å¹´å‰

Best Practices and Spark optimisation Tips for Data engineers

Best Practices and Spark optimisation Tips for Dataâ€¦

Libin Mathew 2 å¹´å‰

Lakes, Lakehouses, Warehouse and.....MDM?

Tim Ward 3 å¹´å‰

- Reduced boilerplate code with optimizations managed by Databricks.

- Real-time streaming scenarios where data outcomes are prioritized over processing steps.

### 2. Procedural Approach with PySpark

The procedural approach involves using PySpark code to control step-by-step execution of transformations, which is ideal for complex ETL pipelines requiring customization.

Example: Procedural Approach in PySpark

```python

from pyspark.sql.functions import upper

# Read raw data from Delta table

raw_data = spark.read.format("delta").load("/path/to/raw_data")

# Apply transformations

transformed_data = raw_data.select(

"id", "name", upper("name").alias("uppercase_name"), "date_of_birth"

)

# Write transformed data to Delta Lake

transformed_data.write.format("delta").mode("overwrite").save("/path/to/transformed_data")

```

- In this example, you explicitly load, transform, and write the data while managing dependencies manually.

### When to Use Procedural:

- Fine-tuning transformations (e.g., partitioning, caching).

- Handling complex joins, aggregations, or UDFs that are challenging to express declaratively.

- Implementing custom error handling or manual job orchestration beyond DLTâ€™s capabilities.

### Conclusion

Choosing between declarative DLT and procedural PySpark depends on your specific use case. If your pipeline involves straightforward transformations and you prefer automation, go for declarative DLT. For more complex workloads requiring extensive customization and control, stick with procedural PySpark.

Which approach do you prefer? Letâ€™s discuss how to fine-tune your specific use cases!

#DeltaLiveTables #Databricks #DataEngineering #ETL #PySpark #DataPipelines #BigData

Shoaib Maroof

Data Engineer at Clear Channel International

2 ä¸ªæœˆ

Any considerations around cost, performance, man power?

èµž

å›žå¤

2 æ¬¡å›žåº”

Olivier Soucy

Founder @ okube.ai | Fractional Data Platform Engineer | Open-source Developer | Databricks Partner

3 ä¸ªæœˆ

Great post! If you want to take DLT to the next level, check out Laktory (www.laktory.ai)! It's an open-source DataOps and dataframe-centric ETL framework. It combines the best of dbt, Databricks Asset Bundles, and Terraform, letting you drive Delta Live Tables with YAML configurations that define data assets and transformations. It also elevates DLT data quality expectations by incorporating aggregation support and automating the quarantine of invalid records for subsequent review. Laktory supports SQL and Spark DataFrame operations, and also serves as an infrastructure-as-code tool for managing Databricks resources. Watch this demo for more on using Laktory with DLT: https://youtu.be/cX3EPV_xWrM

èµž

å›žå¤

1 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Venkat Suryadevaraçš„æ›´å¤šæ–‡ç«

Spring Authorization Server: Empowering Secure OAuth 2.1 and OpenID Connect Solutions

2025å¹´2æœˆ24æ—¥

Spring Authorization Server: Empowering Secure OAuth 2.1 and OpenID Connect Solutions

Spring Authorization Server is a powerful framework that provides robust implementations of the OAuth 2.1 and OpenIDâ€¦
Spring Modulith: Revolutionizing Modular Application Development

2025å¹´2æœˆ23æ—¥

Spring Modulith: Revolutionizing Modular Application Development

Spring Modulith is an innovative project that empowers developers to build well-structured, modular Spring Bootâ€¦
Spring for Apache Kafka: Streamlining Messaging in Enterprise Applications

2025å¹´2æœˆ23æ—¥

Spring for Apache Kafka: Streamlining Messaging in Enterprise Applications

Spring for Apache Kafka is a powerful project that seamlessly integrates the robust messaging capabilities of Apacheâ€¦
Material 3 in Angular : A New Era of Design

2025å¹´2æœˆ23æ—¥

Material 3 in Angular : A New Era of Design

Angular has introduced experimental support for Material 3 theming in Angular Material, marking a significant milestoneâ€¦
Angular Performance Optimization: Maximizing Efficiency in 2025

2025å¹´2æœˆ23æ—¥

Angular Performance Optimization: Maximizing Efficiency in 2025

As Angular continues to evolve, performance optimization remains a critical aspect of building scalable and responsiveâ€¦
Angular's RxJS Interoperability: Bridging Signals and Observables

2025å¹´2æœˆ23æ—¥

Angular's RxJS Interoperability: Bridging Signals and Observables

Angular's latest updates have introduced a powerful set of tools for integrating RxJS Observables with Angular's newâ€¦
Angular's Standalone Components: The New Default in v19

2025å¹´2æœˆ23æ—¥

Angular's Standalone Components: The New Default in v19

Angular v19 is set to introduce a significant change that will reshape how developers build applications: standaloneâ€¦
Angular v19: Elevating Performance and Developer Experience

2025å¹´2æœˆ23æ—¥

Angular v19: Elevating Performance and Developer Experience

Angular v19, released on November 19, 2024, brings a suite of powerful features and improvements that significantlyâ€¦
Angular's Incremental Hydration: Optimizing Performance and User Experience

2025å¹´2æœˆ23æ—¥

Angular's Incremental Hydration: Optimizing Performance and User Experience

As an Angular expert specializing in performance optimization, I'm excited to share insights on Angular's latestâ€¦

1 æ¡è¯„è®º
Angular in 2025: Embracing Innovation and Opportunity

2025å¹´2æœˆ23æ—¥

Angular in 2025: Embracing Innovation and Opportunity

As we navigate through 2025, Angular continues to evolve, offering exciting prospects for developers and businessesâ€¦

1 æ¡è¯„è®º

See all articles

Delta Live Tables: Declarative vs. Procedural Approaches in Databricks

Venkat Suryadevara

Engineering Leadership | Data Engineering, Data Governance, Data Modelling

é¢†è‹±æŽ¨è

Venkat Suryadevaraçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

How to optimize Pyspark Codes for better efficiency.

Data ingestion and integration

Unveiling the data processing in 6 steps.

Maximizing Efficiency and Productivity with Great Expectations on Databricks ??

Databricks: Conditional execution in a job using if-else

Code Optimization in Databricks: Best Practices and Examples

PySpark vs Pandas vs SQL Syntax- Introduction inside Databrick

Mastering Data Skewness in Apache Spark: Essential Techniques

Microsoft Fabric Semantic Link: Simplifying Access to Power BI Semantic Models

Loading data with multi-format date column using SQL in PySpark

é¢†è‹±æŽ¨è

Venkat Suryadevaraçš„æ›´å¤šæ–‡ç«

Spring Authorization Server: Empowering Secure OAuth 2.1 and OpenID Connect Solutions

Spring Modulith: Revolutionizing Modular Application Development

Spring for Apache Kafka: Streamlining Messaging in Enterprise Applications

Material 3 in Angular : A New Era of Design

Angular Performance Optimization: Maximizing Efficiency in 2025

Angular's RxJS Interoperability: Bridging Signals and Observables

Angular's Standalone Components: The New Default in v19

Angular v19: Elevating Performance and Developer Experience

Angular's Incremental Hydration: Optimizing Performance and User Experience

Angular in 2025: Embracing Innovation and Opportunity

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

How to optimize Pyspark Codes for better efficiency.

Data ingestion and integration

Unveiling the data processing in 6 steps.

Maximizing Efficiency and Productivity with Great Expectations on Databricks ??

Databricks: Conditional execution in a job using if-else

Code Optimization in Databricks: Best Practices and Examples

PySpark vs Pandas vs SQL Syntax- Introduction inside Databrick

Mastering Data Skewness in Apache Spark: Essential Techniques

Microsoft Fabric Semantic Link: Simplifying Access to Power BI Semantic Models

Loading data with multi-format date column using SQL in PySpark

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†