登录查看更多内容

Write-Behind Logging - WBL

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

发布日期: 2023年5月16日

One of my most popular blogs to date is also my first blog on Write-Ahead Logs. If you remember, WAL allows databases to store transaction records in a sequential manner, thus preventing the need for the database to perform random writes to persist records on disk, hence helping improve the performance of the database while still ensuring durability. However, with the availability of NVM technologies, is the optimization to ensure sequential access really needed?

WBL — Write-Behind Log

Write-behind logging (WBL) is a logging technique used in database management systems (DBMS) to improve performance. It differs from write-ahead logging (WAL) in that it does not write the log records to persistent storage immediately after they are generated. Instead, the log records are buffered in memory and written to persistent storage in the background. This allows the DBMS to continue processing transactions without waiting for the log records to be written.

But wait, if log records are buffered in memory, wouldn’t that be an issue? That was the entire problem statement for using WAL!

WBL is made possible by advancements in hardware, such as non-volatile memory (NVM). NVM is a type of memory that retains its data even when power is lost. This means that the log records can be buffered in NVM and written to persistent storage later, without the risk of losing data.

How does Write-Behind Logging?work

A user updates record/s in a database.
The changes are buffered in memory. The changes are not written to the log until the transaction commits.
The transaction commits. On commit, the database writes the changes to a log in the NVM.
The database continues processing other requests.
At a later time, when the system is less busy, the database flushes the log to disk.?
The changes to the database are now durable and will not be lost if the system crashes.

Write-behind logging is a trade-off between performance and durability. By delaying the writing of changes to disk, write-behind logging can improve performance. However, it also increases the chances of data loss. Yes!

The potential risk of data loss in write-behind logging arises when the changes have not been flushed from the log buffer in NVM to a more persistent storage medium, such as a disk, before a system failure occurs. If a failure happens before the changes are persisted from NVM to disk, the logged modifications that have not been flushed could be lost.

To mitigate this risk, proper recovery mechanisms should be in place. Upon system restart, the write-behind logging system needs to ensure that any logged changes still residing in NVM are replayed and persisted to disk to restore the database to a consistent state. This recovery process involves correctly handling the logged modifications and ensuring that no data loss or inconsistencies occur during the recovery phase.

Another approach is to make use of deferred write. With deferred write, the database flushes the log to disk at regular intervals. This helps to reduce the chances of data loss in case of failures but still needs the recovery mechanisms to be in place.

Vivek Bansal 10 个月前

May 2023: Metamorphic testing, Oracle migrations, and…

Cockroach Labs 1 年前

Embracing the Future: Unlocking Efficiency and…

Dr. Jagreet Kaur 5 个月前

No alt text provided for this image — Write Behind Log

Advantages of Write-Behind Logging?

Improved performance: Write-behind logging can improve the performance of database systems by allowing them to continue processing requests even if the disk is busy.
Reduced disk I/O: Write-behind logging can reduce disk I/O by delaying the writing of changes to the disk until they are needed.
Increased scalability: Write-behind logging can help to improve the scalability of database systems by allowing them to handle more requests without slowing down.

Disadvantages of Write-Behind Logging?

Increased risk of data loss: Since write-behind logging defers the actual disk write operation, there is a potential risk of data loss in the event of a system failure or crash before the changes are flushed to disk. You need to have proper recovery mechanisms as suggested above to prevent data loss.
Increased Recovery Time: In the event of a system failure, recovering the database using the logged changes can introduce additional recovery time. The system needs to replay the logged modifications and bring the database to a consistent state, which may take longer compared to systems that employ immediate disk writes.
Increased Complexity: Write-behind logging adds an additional layer of complexity to the system architecture. The need to manage and synchronize the in-memory modifications, the log buffer, and the disk writes requires careful design and implementation. This complexity can make the system more prone to bugs, performance issues, and potential data corruption if not handled correctly.

This brings us to the end of this article. We talked about write-behind logging, how it differs from WAL, its advantages, disadvantages and some caveats to take care of while implementing write-behind logging. Please post comments on any doubts you might have and will be happy to discuss them!

Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this?newsletter ?for notifications of new posts.

Please comment on the post with your feedback, will help me improve!?:)

Until next time, Keep asking questions & Keep learning!

Distributed Systems Made Easy

7,916 位关注者

Rajat Kanti Bhattacharjee ???

Engineer@Sharechat | MS @GeorgiaTech

1 年

?? Every strategy just boils down to use a better "disk" at this point. Did not knew that buffering strategy is called ?? WBL ... Nice article ????♂?????♂?

1 次回应

查看更多评论

要查看或添加评论，请登录

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

2024年5月29日

Database Intermediate Series: Change Data Capture(II)

Our previous post discussed Change Data Capture and how to implement it using triggers. In this post, we’ll explore how…

1 条评论
Database Intermediate Series: Change Data Capture(I)

2024年4月23日

Database Intermediate Series: Change Data Capture(I)

Change Data Capture (CDC) refers to identifying and capturing changes made to data in a database and then delivering…

2 条评论
Database Intermediate Series: SQL Isolation Levels Internals

2024年4月4日

Database Intermediate Series: SQL Isolation Levels Internals

In our last post, we talked about Database Isolation Levels and how different Isolation Levels allow us to balance the…

1 条评论
Database Basics Series: Understanding SQL Isolation Levels

2024年3月21日

Database Basics Series: Understanding SQL Isolation Levels

We are starting a new series on Databases, covering Basic, Intermediate, and Advanced concepts. This is the first…

6 条评论
Go Concurrency Series: Concurrency Patterns(II)

2024年2月3日

Go Concurrency Series: Concurrency Patterns(II)

In our last post, we talked about the Worker Pool and Pipeline concurrency patterns, that we can use while designing…

1 条评论
Go Concurrency Series: Concurrency Patterns

2024年1月23日

Go Concurrency Series: Concurrency Patterns

Let’s continue being a little more hands-on in our Go Concurrency Series! In this post, we’ll look into the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(III)

2024年1月20日

Go Concurrency Series: Deep Dive into Go Scheduler(III)

In my previous posts in the Go Concurrency Series, I’ve gone into the different components of the Go Scheduler and…
Go Concurrency Series: Deep Dive into Go Scheduler(II)

2024年1月14日

Go Concurrency Series: Deep Dive into Go Scheduler(II)

In my last post, we covered the components inside the Go Scheduler, and how a Go Scheduler can orchestrate the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(I)

2024年1月4日

Go Concurrency Series: Deep Dive into Go Scheduler(I)

In my last post about Goroutines, we talked about how Goroutines differ from Traditional threads. The Go Runtime…

6 条评论
Go Concurrency Series: Introduction to Goroutines

2023年12月25日

Go Concurrency Series: Introduction to Goroutines

Concurrency is a fundamental concept in modern software development, enabling programs to handle multiple tasks…

4 条评论

See all articles

Write-Behind Logging - WBL

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

WBL — Write-Behind Log

How does Write-Behind Logging?work

领英推荐

Advantages of Write-Behind Logging?

Disadvantages of Write-Behind Logging?

Distributed Systems Made Easy

7,916 位关注者

Pratik Pandey的更多文章

社区洞察

其他会员也浏览了

A QA’s Guide To Database Testing in 2023

A Step-by-Step Guide to Designing High-Performance Systems

Cache Eviction Policies

Setting up a secure Log Management system for K8S cluster using Loki, Promtail and Grafana

Beyond Caching

Graceful Shutdown in NestJS with Lifecycle Events

Database Testing: Vital Test Cases to Ensure a Robust Database (2024)

Secure Database Connection with Kerberos Authentication in Docker

Synchronous vs. Asynchronous Replication

WBL — Write-Behind Log

How does Write-Behind Logging?work

领英推荐

Advantages of Write-Behind Logging?

Disadvantages of Write-Behind Logging?

Distributed Systems Made Easy

7,916 位关注者

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

Database Intermediate Series: Change Data Capture(I)

Database Intermediate Series: SQL Isolation Levels Internals

Database Basics Series: Understanding SQL Isolation Levels

Go Concurrency Series: Concurrency Patterns(II)

Go Concurrency Series: Concurrency Patterns

Go Concurrency Series: Deep Dive into Go Scheduler(III)

Go Concurrency Series: Deep Dive into Go Scheduler(II)

Go Concurrency Series: Deep Dive into Go Scheduler(I)

Go Concurrency Series: Introduction to Goroutines

社区洞察

其他会员也浏览了

A QA’s Guide To Database Testing in 2023

A Step-by-Step Guide to Designing High-Performance Systems

Cache Eviction Policies

Setting up a secure Log Management system for K8S cluster using Loki, Promtail and Grafana

Beyond Caching

Graceful Shutdown in NestJS with Lifecycle Events

Database Testing: Vital Test Cases to Ensure a Robust Database (2024)

Secure Database Connection with Kerberos Authentication in Docker

Synchronous vs. Asynchronous Replication