WAL - Distributed Design Patterns

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

发布日期: 2022年5月31日

There are a lot of distributed system patterns, that are leveraged widely by different distributed systems. As part of this series, I'll go over some of the distributed design patterns, that will help you in developing your own distributed systems or understanding the ones you already use.

Strong durability is a default, that we assume out of all our data stores. But have you wondered, how this high durability is achieved with all the high performance/throughput requirements, even in the face of server crashes(power failures/os failures/hardware failures). Well, the answer is WAL.

WAL - Write Ahead Log

WAL isn't truly a distributed design pattern, but widely used pattern that is prevalent in most distributed systems guaranteeing durability.

WAL has been a common theme across traditional RDBMS systems, and is used to help with atomicity & durability(A & D of ACID) guarantees. All mutations to a table are written first to the WAL(Transaction Log/Bin Log) & then applied to the actual data files of the tables asynchronously.

Sample WAL and WALEntry structure:

type WAL struct {
    dir       string // directory under which WAL files are stored.

    file      *os.File // reference to the file

    metadata  []byte           // metadata recorded at the head of each WAL
    decoder   *decoder       // decoder to decode records
    encoder   *encoder // encoder to encode records
    
    mutex          sync.Mutex // To ensure single update per writer
    lastIndex      uint64   // index of the last entry saved to the WAL
}

type WALEntry struct {
    lsn      uint64 // unique identifier for each log entry
    data     []byte // actual WAL entry in bytes
    crc      uint32 // crc for data integrity validation
    type     uint32 // type of wal record  
}

Life Without WAL

You must be wondering, why do we really need WAL? Why not flush the changes directly to the actual data files instead of WAL?

There are 2 aspects to it -

A write to disk is never really flushed directly. The data goes through various buffers(RAM/Buffer Cache/Disk Cache), before truly being flushed to disk sectors. These caches help in reducing the number of disk writes, as they are expensive & hence help in improving performance. However, the downside to them is that if there are restarts/crashes, the data in these intermediate caches is lost, hence impacting durability of our data. If we start avoiding the caches, we'd be making disk flushes for each write & that would impact the performance and throughput of your system.
As I mentioned above, disk writes are slow. Within the disk writes as well, sequential disk writes are much faster compared to random disk writes(applicable to SSD as well). If we are writing records to multiple tables/entities in our data store, chances are we'll be doing more random writes. Also, any write to disk for our data tables, might also need update to the on disk structures(Eg: Records need to be kept sorted by Clustered Indexes), update to on disk indexes and other auxiliary structures as well, directly impacting the throughput and performance.

WAL to the Rescue

WAL is an append only log, which stores each state change on the data store, as a command. Now, instead of flushing the data mutations on disk for different tables/entities, we just flush the command/operation in the WAL(1 disk operation).

A separate async process can read the operations from the WAL, and then apply the data mutations to the actual data files on the disk following the normal flow through the different caches. This really helps in improving the write throughput for the data stores.

Also, in case of failures, there could be changes that might not have been applied to the data files. However, since we have the operations present in the WAL file, we could replay the operations from WAL and apply them to bring the data store back to a consistent state. Thus, WAL helps us in ensuring data integrity and reliability, while still allowing high write throughput for our data stores.

Real Life Implementation Considerations

1. Flushing WAL Operations to Disk -

As mentioned before, writes to disk might not be flushed directly. Most File Handling libraries, offer you ways to force the OS to flush the changes directly to the disk sectors. While flushing on each write will give you strong durability guarantees, it can potentially cause performance issues in a write heavy system. This is where the tradeoff needs to be made, where you can have either a flush frequency or micro batches or both to flush the changes to disk, to help with the performance. Note that there is a risk of data loss here.

2. Corruption Detection -

We need to ensure that any operation flushed to disk is not corrupted. For this, we make sure that each WAL Record also contains a CRC value, which can be used to validate when the record is read from WAL and ensure that there is no corruption.

3. Operation Duplication -

Since WAL is an append only file, in case the client retries because of a communication failure, you can have scenarios of duplicate operations being written on the WAL. Hence whenever we are reading the WAL, to apply the changes to the actual data store, we need to ensure that either the duplicates are ignored, or the application of operations to actual data store is Idempotent.

Real Life Usage

1) All databases, including NoSQL databases like Cassandra make use of WAL to guarantee durability.

2) Kafka makes use of a similar structure as WAL(Commit Log).

3) KV Stores like Rocks DB, Level DB & Distributed caches like Apache Ignite also use WAL.

Summary

To sum it up, WAL allows us -

1) Faster performance & Throughput as we avoid data flush/disk writes for all changes.

2) Recoverability in case of restarts as operations can be applied from WAL to actual data store.

3) Ability to restore to a point in time snapshot, since we have all operations present in the WAL.

Thank you for reading! I'll be posting more such content on distributed systems & patterns, so please like, share and subscribe to this newsletter for notifications of new posts.

Please comment on the post with your feedback, will really help me improve! :)

Until next time, keep asking questions & keep learning!

Distributed Systems Made Easy

7,878 位关注者

Marcin Por?bski

Magician Architect with Midas touch

1 年

Once there's operation in the WAL updating record A=[1,2,3] updating it to A'=[2,4,6] which was not flushed, then if we read in another thread that record we will get A or A'? Is there some mechanism to check if there was operation on that record in WAL and then delay reading till WAL is flushed?

1 次回应

Samuel Palukuri, PMP?

Senior Project Manager at Franklin Templeton

2 年

Great article Pratik. Thank you. I can't remember the last time i did true programming but a couple of questions to help me understand the concept better please.. 1. Is this log(containing the actual WAL entry) just a file with strings (could be compressed/encoded)? 2. Is this log duplicated and stored on multiple nodes? What ensures the server this log sits on does not have the same issues that this log solves for? 3. In cases where BLOBs are stored on the Database, do they go through this as well? Thanks for sharing your knowledge with us

1 次回应

Krishna Tangirala

Technology and Delivery Leader @ S&P Global | Building technologically strategic solutions

2 年

This is a great start with a useful topic. While I knew about Commit logs, I didn't know the term "WAL" until I read this article. Pretty informative! Are there any libraries (Java/C# for e.g.) that help in implementing this design pattern keeping performance and throughput into consideration (as opposed to developing this from scratch)?

1 次回应

查看更多评论

要查看或添加评论，请登录

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

2024年5月29日

Database Intermediate Series: Change Data Capture(II)

Our previous post discussed Change Data Capture and how to implement it using triggers. In this post, we’ll explore how…

1 条评论
Database Intermediate Series: Change Data Capture(I)

2024年4月23日

Database Intermediate Series: Change Data Capture(I)

Change Data Capture (CDC) refers to identifying and capturing changes made to data in a database and then delivering…

2 条评论
Database Intermediate Series: SQL Isolation Levels Internals

2024年4月4日

Database Intermediate Series: SQL Isolation Levels Internals

In our last post, we talked about Database Isolation Levels and how different Isolation Levels allow us to balance the…

1 条评论
Database Basics Series: Understanding SQL Isolation Levels

2024年3月21日

Database Basics Series: Understanding SQL Isolation Levels

We are starting a new series on Databases, covering Basic, Intermediate, and Advanced concepts. This is the first…

6 条评论
Go Concurrency Series: Concurrency Patterns(II)

2024年2月3日

Go Concurrency Series: Concurrency Patterns(II)

In our last post, we talked about the Worker Pool and Pipeline concurrency patterns, that we can use while designing…

1 条评论
Go Concurrency Series: Concurrency Patterns

2024年1月23日

Go Concurrency Series: Concurrency Patterns

Let’s continue being a little more hands-on in our Go Concurrency Series! In this post, we’ll look into the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(III)

2024年1月20日

Go Concurrency Series: Deep Dive into Go Scheduler(III)

In my previous posts in the Go Concurrency Series, I’ve gone into the different components of the Go Scheduler and…
Go Concurrency Series: Deep Dive into Go Scheduler(II)

2024年1月14日

Go Concurrency Series: Deep Dive into Go Scheduler(II)

In my last post, we covered the components inside the Go Scheduler, and how a Go Scheduler can orchestrate the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(I)

2024年1月4日

Go Concurrency Series: Deep Dive into Go Scheduler(I)

In my last post about Goroutines, we talked about how Goroutines differ from Traditional threads. The Go Runtime…

6 条评论
Go Concurrency Series: Introduction to Goroutines

2023年12月25日

Go Concurrency Series: Introduction to Goroutines

Concurrency is a fundamental concept in modern software development, enabling programs to handle multiple tasks…

4 条评论

See all articles

WAL - Write Ahead Log

Life Without WAL

WAL to the Rescue

Real Life Implementation Considerations

1. Flushing WAL Operations to Disk -

2. Corruption Detection -

3. Operation Duplication -

Real Life Usage

Summary

Distributed Systems Made Easy

7,878 位关注者

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

Database Intermediate Series: Change Data Capture(I)

Database Intermediate Series: SQL Isolation Levels Internals

Database Basics Series: Understanding SQL Isolation Levels

Go Concurrency Series: Concurrency Patterns(II)

Go Concurrency Series: Concurrency Patterns

Go Concurrency Series: Deep Dive into Go Scheduler(III)

Go Concurrency Series: Deep Dive into Go Scheduler(II)

Go Concurrency Series: Deep Dive into Go Scheduler(I)

Go Concurrency Series: Introduction to Goroutines