登录查看更多内容

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

Jon Esmael

Director of Business Development @ Confiz | D365, Data & Analytics, Microsoft Data Services, Power BI, MSFT Fabric, Power Apps, F&O, CE, CRM, AI, ML, LLMs, Retail, CPG, FinServ, Grocery, ...

发布日期: 2019年4月29日

+ 关注

Key Features

ACID Transactions:

Data lakes typically have multiple data pipelines reading and writing data concurrently, and data engineers have to go through a tedious process to ensure data integrity, due to the lack of transactions. Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level. Serializable isolation levels ensure that readers never see inconsistent data.

Scalable Metadata Handling:

In big data, even the metadata itself can be "big data". Delta Lake treats metadata just like data, leveraging Spark's distributed processing power to handle all its metadata. As a result, Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease.

Time Travel (data versioning):

Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments.

Open Format:

All data in Delta Lake is stored in Apache Parquet format enabling Delta Lake to leverage the efficient compression and encoding schemes that are native to Parquet.

Unified Batch and Streaming Source and Sink:

A table in Delta Lake is both a batch table, as well as a streaming source and sink. Streaming data ingest, batch historic backfill, and interactive queries all just work out of the box.

Schema Enforcement:

Delta Lake provides the ability to specify your schema and enforce it. This helps ensure that the data types are correct and required columns are present, preventing bad data from causing data corruption.

Schema Evolution:

Big data is continuously changing. Delta Lake enables you to make changes to a table schema that can be applied automatically, without the need for cumbersome DDL.

100% Compatible with Apache Spark API:

Developers can use Delta Lake with their existing data pipelines with minimal change as it is fully compatible with Spark, the commonly used big data processing engine.

要查看或添加评论，请登录

查看全部

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

Jon Esmael

Director of Business Development @ Confiz | D365, Data & Analytics, Microsoft Data Services, Power BI, MSFT Fabric, Power Apps, F&O, CE, CRM, AI, ML, LLMs, Retail, CPG, FinServ, Grocery, ...

Key Features

ACID Transactions:

Scalable Metadata Handling:

Time Travel (data versioning):

Open Format:

Unified Batch and Streaming Source and Sink:

Schema Enforcement:

Schema Evolution:

100% Compatible with Apache Spark API:

更多精彩文章

社区洞察

其他会员也浏览了

Using Airbyte with Tabular

Spark Performance Tuning: Addressing Common Issues and Optimization Strategies

Navigating the Delta Lake Foundation

Databricks Photon and its relation to Apache Spark

Predicate vs Projection Pushdown in Spark 3

Repartition and Coalesce in Apache Spark

Apache Spark 101: DataFrame Write API Operation

Apache Spark Aggregation Methods: Hash-based Vs. Sort-based

Sometimes, You DON’T Really Need a Distributed System

May 2023 - Iceberg Community News

Key Features

ACID Transactions:

Scalable Metadata Handling:

Time Travel (data versioning):

Open Format:

Unified Batch and Streaming Source and Sink:

Schema Enforcement:

Schema Evolution:

100% Compatible with Apache Spark API:

Why I am not afraid of AI

2023年5月19日

Application Guard = Safe Browsing

2018年4月7日

Q: Who is Accountable and who is Responsible?

2017年10月4日

Peters & Associates turns 36 on the 36th anniversary of the IBM PC

2017年8月12日

Do you live in the past with MS Office Products?

2017年5月29日

Don't Let Your Business be the Target of cyber Attack

2015年7月12日

Are you doing all you can when It comes to data security?

2015年3月28日

Microsoft Power BI - Power Map

2014年11月4日

Office 365 is always on the Move

2014年11月4日

Responding to Emails

2014年10月25日

社区洞察

其他会员也浏览了

Using Airbyte with Tabular

Spark Performance Tuning: Addressing Common Issues and Optimization Strategies

Navigating the Delta Lake Foundation

Databricks Photon and its relation to Apache Spark

Predicate vs Projection Pushdown in Spark 3

Repartition and Coalesce in Apache Spark

Apache Spark 101: DataFrame Write API Operation

Apache Spark Aggregation Methods: Hash-based Vs. Sort-based

Sometimes, You DON’T Really Need a Distributed System

May 2023 - Iceberg Community News