Database Internals - Write Ahead Log File Structure

Database Internals - Write Ahead Log File Structure

Write Ahead logging is a technique that says: changes to the data files(tables and indexes) must be written after those changes have been logged i.e. records describing those changes have been flushed to permanent storage.

Refer here for my detailed explanation and benefits of WAL: Database Internals: Write Ahead Logging(WAL)

Thanks for reading Curious Engineer! Subscribe for free to receive new posts and support my work.

In this article, I deep-dived into the PostgreSQL WAL(write-ahead log) files and tried to understand the file structure. Here are the steps if you want to follow along, otherwise skip the installation steps and jump to Analyzing the file structure.

Note: I am working on macOS 14.3 and PostgreSQL 14 versions. If you’re using another Operating system, please search for corresponding commands.

Step 1: Installing PostgreSQL

brew install postgresql

Step 2: Starting the server

brew services start postgresql

Step 3: Installing libpq

This step involved installing the libpq library from the homebrew which installs some other commands required alongside the PostgreSQL for analyzing the WAL file.

brew install libpq

After this step, you should have your “pg_waldump“ command installed and if you run “which pg_waldump“, it should print the location of your executable file for this command. This is the output in my case: “/opt/homebrew/bin/pg_waldump “

Step 4: Locating WAL files

At this point, I located the pg_wal folder which contains the WAL files of the postgres. They are under the following directory:

“/opt/homebrew/var/postgresql@14/pg_wal/“

Step 5: Connect to Postgres

psql -U postgres

Step 6: Create a table “employee“

CREATE TABLE employee (
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    date_of_birth DATE,
    hire_date DATE,
    salary DECIMAL(10, 2)
);        

Please note that this is just an example so that I can easily insert the data and analyze the WAL file. The real table `employee` might have more columns like id (primary key), department, etc.

Step 7: Adding some records to the employee table

I used the following command to repeatedly add some records to the employee table.

INSERT INTO employee (first_name, last_name, date_of_birth, hire_date, salary)

Step 8: Analyzing the WAL file

After I hit around 20 INSERT operations, I used the following command to analyze the file:

“pg_waldump /opt/homebrew/var/postgresql@14/pg_wal/000000010000000000000001”

This command generated a big output of the WAL file but here’s a short exact context of that file.

Thanks for reading Curious Engineer! Subscribe for free to receive new posts and support my work.


rmgr: Transaction len (rec/tot):   34/  34, tx:    771, lsn: 0/01787188, prev 0/01787138, desc: COMMIT 2024-02-11 13:28:49.944018 IST
rmgr: Heap    len (rec/tot):   80/  80, tx:    772, lsn: 0/017871B0, prev 0/01787188, desc: INSERT off 29 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot):   34/  34, tx:    772, lsn: 0/01787200, prev 0/017871B0, desc: COMMIT 2024-02-11 13:28:50.233002 IST
rmgr: Heap    len (rec/tot):   80/  80, tx:    773, lsn: 0/01787228, prev 0/01787200, desc: INSERT off 30 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot):   34/  34, tx:    773, lsn: 0/01787278, prev 0/01787228, desc: COMMIT 2024-02-11 13:28:50.562946 IST
rmgr: Standby   len (rec/tot):   50/  50, tx:     0, lsn: 0/017872A0, prev 0/01787278, desc: RUNNING_XACTS nextXid 774 latestCompletedXid 773 oldestRunningXid 774
rmgr: Heap    len (rec/tot):   80/  80, tx:    774, lsn: 0/017872D8, prev 0/017872A0, desc: INSERT off 31 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot):   34/  34, tx:    774, lsn: 0/01787328, prev 0/017872D8, desc: COMMIT 2024-02-11 13:28:50.861983 IST
rmgr: Heap    len (rec/tot):   80/  80, tx:    775, lsn: 0/01787350, prev 0/01787328, desc: INSERT off 32 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot):   34/  34, tx:    775, lsn: 0/017873A0, prev 0/01787350, desc: COMMIT 2024-02-11 13:28:51.197795 IST
rmgr: Heap    len (rec/tot):   80/  80, tx:    776, lsn: 0/017873C8, prev 0/017873A0, desc: INSERT off 33 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot):   34/  34, tx:    776, lsn: 0/01787418, prev 0/017873C8, desc: COMMIT 2024-02-11 13:28:51.537681 IST
rmgr: Heap    len (rec/tot):   80/  80, tx:    777, lsn: 0/01787440, prev 0/01787418, desc: INSERT off 34 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0        

The above PostgreSQL WAL file log shows entries for several insert operations that happened between 13:28:49 IST and 13:28:51 IST on February 11, 2024. Here's a breakdown of the two most used commands (rmgr: transaction) and (rmgr: Heap)

  1. rmgr: Transaction: This indicates that a database transaction is being performed. The description contains what kind of operation it is on the transaction level like COMMIT, ROLLBACK, etc.len (rec/tot): 34/34: The length of the record is 34 bytes.tx: 771: This is the transaction ID, which helps track changes and ensure consistency.lsn: 0/01787188: This is the Log Sequence Number, which is a unique identifier for each record in the WAL (Write-Ahead Logging) file.prev 0/01787138: This is the previous log sequence number, indicating the order of records. The first part (0) indicates the segment number within the WAL file. The second part (0/01787138) is the relative offset within that segment.desc: COMMIT 2024-02-11 13:28:49.944018 IST: This indicates that a commit action occurred at the specified timestamp and timezone.
  2. rmgr: Heap: This indicates an operation on a table. The description contains what kind of operation it is on the table level like INSERT, UPDATE, DELETE, etc.len (rec/tot): 80/80: The length of the record is 80 bytes.tx: 772: Transaction ID.lsn: 0/017871B0: Log Sequence Number.prev 0/01787188: This is the previous log sequence numberdesc: INSERT off 29 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0: This indicates an INSERT operation at offset 29, with some additional details about the table and block reference.
  3. Similar actions follow, including more transactions (rmgr: Transaction) and heap operations (rmgr: Heap).
  4. The last line indicates the insertion of data into the table (rmgr: Heap) with transaction ID 777.

In summary, the log shows that several rows were inserted into a table (relation ID 1663) between 13:28:49 and 13:28:51 IST. Each insert was part of a separate transaction (tx ID 771 to 777). The rows are not yet visible ("off") as the transactions haven't been fully committed. Once the transaction commits successfully, the changes become visible, and the "off" flag becomes irrelevant

Also, after analyzing the WAL file, I asked myself “Where is the actual data which has been inserted into the database?“ I think the answer to this question is the fact that WAL files are actually binary files and not human-readable. So, the command “pg_waldump” might be omitting the true data and printing some information only.

Thus, when I tried to use the following command "cat /opt/homebrew/var/postgresql@14/pg_wal/000000010000000000000001", I was able to get the following output which contained “John“ as the keyword and it ensured that the WAL file had the true data.

Key Takeaways

  1. WAL files are append-only files which means you can only append at the end of the file and cannot modify anything existing already.
  2. WAL file does not mean it’s a very big single file but it can consist of multiple files and are stored under the directory pg_wal. Each file is called a segment which is normally 16 MB in size. Each segment of 16 MB consists of multiple pages (or blocks whatever you like) where each page consists of 8 KB. (you can modify the segment size or the page size if you know what you’re doing)
  3. Segment files are given ever-increasing numbers as names, starting at 000000010000000000000001. Whenever any entry goes into the WAL file, a unique entry is assigned to each record which is called the Log Sequence Number (LSN). This log sequence number is actually the byte offset of the entry in the WAL file.
  4. Since LSNs are sequential, searching for specific WAL records becomes efficient. You can directly jump to the desired record based on its LSN without needing to scan the entire file from the beginning.


That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your LinkedIn network and schedule a free 30-minute 1:1 with me:) Ping me on Linkedin!

Thanks for reading Curious Engineer! Subscribe for free to receive new posts and support my work.

Resources

WAL Internals: Postgres official documentation

Transaction Log by Wikipedia

Write Ahead Logging by Arpit

Ivan Nikulin

Backend Developer (Java), Web3 enthusiast, mountain skiing lover

4 个月

Thanks so much for this! Was a bit hard to find this exact info. What would happen if I batch insert large amount of data (exceeding 16mb)? Would the insert data be split among multiple 16mb files?

回复
Pankaj Kapoor

Senior Principal Engineer, Cadence| Ex- Meta, Qualcomm/Microsoft/Adobe

9 个月

WAL-E ??

要查看或添加评论,请登录

Vivek Bansal的更多文章

  • 1 year to Curious Engineer ??

    1 year to Curious Engineer ??

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

  • Message Queues vs Message Brokers

    Message Queues vs Message Brokers

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论
  • Introduction to gRPC

    Introduction to gRPC

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    7 条评论
  • Non-Functional Requirements

    Non-Functional Requirements

    Brief Introduction Let’s say you are building a website that allows users to book flight tickets. The requirements for…

    4 条评论
  • QuadTrees

    QuadTrees

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    2 条评论
  • Text Based Search: ElasticSearch

    Text Based Search: ElasticSearch

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论
  • Sharding vs Partitioning

    Sharding vs Partitioning

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    5 条评论
  • SkipList: A probabilistic data structure

    SkipList: A probabilistic data structure

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    9 条评论
  • Discovering WebSockets using some Coding

    Discovering WebSockets using some Coding

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    1 条评论
  • Onboarding a new video in a Content Platform

    Onboarding a new video in a Content Platform

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论

社区洞察

其他会员也浏览了