登录查看更多内容

How does Postgres persist to disk? What is WAL all about?

Amr Elhewy

Senior Software Engineer

发布日期: 2025年2月10日

Introduction

Hello folks! in this quick article i’m going to be talking about how a database like Postgres actually persists to disk and what happens behind the scenes. What is WAL all about and what does it even stand for? Let’s dive in.

When it comes to I/O operations (in our case writing to disk). Optimizations become a must because I/O operations are very expensive especially writing to disk due to a lot of factors. Postgres tackles this problem in a clever manner.

When a write request is made to the database, You’d think that it would persist it to disk right away, but that’s not what actually happens behind the scenes. Introducing Postgres Shared Buffers

Shared Buffers

When a write is made it isn’t instantly flushed to disk, Postgres actually loads the page getting impacted by the write to memory and adjusts it in memory.

Postgres saves rows in pages and whenever we update a certain row we find the page it’s in, load it to memory and update the page with the new record respectively. Same for writes whether its a new page or an existing one that’s not yet full.

This is called a dirty page where it needs to be written back to disk later. Later on Postgres relies on background processes like the background writer or checkpointsto write dirty pages from shared buffers to disk asynchronously. (We’ll get into that)

But the thing is this approach hugely minimizes I/O by batching the flush operation to disk and not have it going on for each and every write.

On the other hand, when a read request is made Postgres first checks the shared buffers for the pages being requested (whether dirty or not). If found it then proceeds to serve it from there, if not it loads the page from disk and adds it to the shared buffers cache. If the shared buffer is full a victim dirty page will be written to disk and the newly read page will be replaced by it.

领英推荐

Fast Search with PostgreSQL: GIN Index

White Stork 8 个月前

Deadlock & Resolving Deadlocks with “SKIP LOCKED” in…

Sarvaha Systems 1 年前

ClickHouse external tables wizard, join multiple…

Timeplus 10 个月前

Now the question that arises is what happens if you lose power half-way through writing to the data files? Let’s say some write was made to memory and during flushing it to disk a power cut happened causing that data to be inconsistent and weird. The client still thinks he made that write when In reality it never persisted to disk completely. This can cause a lot of data corruption and integrity loss. Hence introducing WAL

WAL (Write Ahead Log)

WAL or Write Ahead Log is a mechanism to ensure the consistency and safety of data. It is a technique where every change to the database is logged before it is applied to the actual data files. This log acts as a journal that records all modifications. It is an append only log that logs everything a user writes to the database while simultaneously updating the data pages in shared buffer as mentioned before. Any newly written data can remain in the shared buffer as long as we have a log that tracked the change. In case anything goes wrong we can reconstruct the state from the log.

The WAL is much smaller than the actual data files, and so we can flush them relatively faster. They are also sequential unlike data pages which are random. Disks love sequential writes, which is another benefit of WAL.

Every WAL entry is first written to a WAL buffer in memory. Then when a certain trigger comes this buffer gets flushed to disk. The trigger can either be when the buffer reaches a certain size or when a certain period passes. These are all configurable from Postgres’s side. Once flushed to disk these entries can be announced committed.

The database can crash after writing WAL entries (before flushing shared buffer to disk), that is fine, as long we know the transaction state belonging to each WAL entry we can discard or omit uncommitted WAL entries upon recovery (to ensure data consistency).

For example if you are in the middle of a transaction and the database crashed, we consider the transaction rolled-back by default. I will do another article explaining how WAL actually writes transactions

When all the data files have been flushed and updated to reflect the information on the WAL This is something called checkpointing. Once this happens a checkpoint recordin the Write-Ahead Log (WAL) is recorded, marking the point up to which all changes have been flushed to disk.

In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the WAL (known as the redo record) from which it should start the REDO operation.

要查看或添加评论，请登录

Amr Elhewy的更多文章

What is PKCE in OAuth 2.0? ??

2025年2月7日

What is PKCE in OAuth 2.0? ??

PKCE stands for Proof Key for Code Exchange. Have you ever had the thought of "Wonder what would happen if someone else…
Always benchmark your input data and don't blindly follow Big O

2025年2月1日

Always benchmark your input data and don't blindly follow Big O

Hello everyone! This is just a quick reminder type article to not always fully commit to big o notation without…

How does Postgres persist to disk? What is WAL all about?

Amr Elhewy

Senior Software Engineer

Introduction

Shared Buffers

领英推荐

WAL (Write Ahead Log)

Amr Elhewy的更多文章

社区洞察

其他会员也浏览了

What Is Auto Failover And How Does It Help With High Availability?

My Hilariously Horrid Database Migration Story: A Tale of Team Growth and Lessons Learned

Howto Validate years of SQL Audit files with a CloudXcellence special!

Database Internals - Write Ahead Log File Structure

PGAdmin4 and psql

Decoding the Backbone: Understanding PostgreSQL Background Processes for Optimal Performance

The Hidden Challenge of PostgreSQL Partitioning

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

How to optimize information storage in iPortal for high concurrency scenarios?

Building Your First Database - Part 2

Introduction

Shared Buffers

领英推荐

WAL (Write Ahead Log)

Amr Elhewy的更多文章

What is PKCE in OAuth 2.0? ??

Always benchmark your input data and don't blindly follow Big O

社区洞察

其他会员也浏览了

What Is Auto Failover And How Does It Help With High Availability?

My Hilariously Horrid Database Migration Story: A Tale of Team Growth and Lessons Learned

Howto Validate years of SQL Audit files with a CloudXcellence special!

Database Internals - Write Ahead Log File Structure

PGAdmin4 and psql

Decoding the Backbone: Understanding PostgreSQL Background Processes for Optimal Performance

The Hidden Challenge of PostgreSQL Partitioning

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

How to optimize information storage in iPortal for high concurrency scenarios?

Building Your First Database - Part 2