How Transaction Id wraparound in PostgreSQL can cause a database outage

How Transaction Id wraparound in PostgreSQL can cause a database outage

Problem Statement - Transaction Id Wraparound

Transaction ID wraparound can cause a significant issue in PostgreSQL database that can lead to data corruption if not managed properly.

In PostgreSQL, every transaction is assigned a unique transaction ID (XID), which is a 32-bit unsigned integer. This means the XID can take on values from 0 to 4,294,967,295 (2^32 - 1), roughly 4 billion. After all the ID(s) are exhausted, the counter for transactionId wraps around back to 0. This is known as transaction ID wraparound.


How Wraparound can cause an Issue in PostgreSQL

When the transaction ID counter wraps around, older transaction IDs can be mistaken for newer ones due to their numerical value being lower after the wraparound. For example, if the XID counter wraps from 4,294,967,295 back to 0, transactions that were assigned XIDs before the wraparound (e.g., 4,294,967,290) can appear newer than transactions assigned XIDs immediately after the wraparound (e.g., 10). This can lead to serious issues with data visibility and consistency.

Data Corruption Risk

If PostgreSQL reuses transaction IDs without proper handling, old data might appear as new. This can corrupt the database by making very old rows look like they were just inserted.

Prevention

To prevent data corruption, PostgreSQL relies on the VACUUM process to "freeze" old transaction IDs. Freezing means marking old tuples with a special transaction ID that indicates it is very old and should not be considered new even if the transaction ID counter wraps around. Let's discuss this shortly.


What's the problem then ?

The process called VACUUM which keeps in check that these Older Id's are available again ahead of time for the new transactions so that when upper limit of 4.2 billion is reached we still have enough runway once the wraparound happens.

Several activities and conditions in PostgreSQL can delay or ignore the VACUUM process and potentially halt database operations for reads and writes. Understanding these scenarios is crucial for maintaining database performance and ensuring the smooth execution of VACUUM operations.

  • Transactions that run for an extended period can prevent VACUUM from reclaiming space or freezing tuples. This is because VACUUM must wait for these transactions to complete to ensure data consistency.
  • Operations that acquire exclusive locks on tables (such as ALTER TABLE, DROP TABLE, or certain UPDATE/DELETE statements) can block VACUUM from running.
  • If the autovacuum process is disabled (autovacuum = off), regular maintenance tasks such as VACUUM and ANALYZE will not run automatically.
  • Misconfiguration of settings like vacuum_cost_delay and vacuum_cost_delay can delay VACUUM process.
  • During a checkpoint, the database writes all dirty pages to disk. If a checkpoint is in progress, VACUUM might be delayed to avoid increasing the I/O load.


VACUUM

Vacuum processing is a maintenance process of PostgreSQL. Its two main tasks are?removing dead tuples?and the?freezing transaction ids. Dead or unused tuples should be removed from the pages, this process of defragmentation is required to reclaim disk space in order to provide a contiguous space for the new rows/tuples.

Dead Tuples

In PostgreSQL, dead tuples are rows that have been deleted or updated but have not yet been physically removed from the table. These tuples are created because PostgreSQL uses a Multi version Concurrency Control (MVCC) system to handle concurrent transactions without blocking other transaction reading the same record at the same time, allowing for efficient concurrent reads and writes. So PostgreSQL does not support in-place updates of the rows, instead it creates a new version of the row with the updated data and this is how it supports MVCC.

Operations contributing to dead tuples :

  • Update: The old row is marked as dead, and a new version of the row is inserted (because of MVCC).
  • Delete: The row is marked as dead but is not immediately removed.


How PostgreSQL handles transactionId wraparound issue


Postgres Tuple layout and the role of transaction Ids

Before we dive into the working of VACUUM and how it handles transaction wraparound. Let's discuss the PostgreSQL's heap tuple(row) structure.


Postgres record structure

Let's focus on the tuple header for a while, the description of the header fields in given in the table below :


Postgres record header fields


The?xmin?and?xmax?values are hidden from daily operations, but we can still fetch them :

SELECT *, xmin, xmax FROM <table>;        


The image below shows how xmin and xmax values are assigned to the tuple while creating and updating a tuple respectively.



When you use SELECT ... FOR UPDATE, PostgreSQL acquires an exclusive lock on the rows that are returned by the query. So that no other transaction can update these rows until the lock is released by the current transaction.


Process for Transaction Id wraparound

As Postgres keep on creating new versions of the tuples enforcing MVCC, it keep on adding more and more rows to the heap table, which generates a lot of dead tuples and other records that are intentionally deleted. This concurrency control mechanism requires some maintenance process in place to defragment the disk space and reutilising the transaction Ids.

This maintenance process is called VACUUM. It has the following responsibilities :

1. Removing dead tuples from the index and heap pages.

2. Update statistics for the transactions and query planner.

3. Freezing Old Transaction Ids

The scope of this document is to cover the 3rd point Freezing Old Transaction Ids, which helps in smooth operation of the transaction Id wraparound and avoiding wraparound failures.

Circular Transaction ID (XID) Space

  • PostgreSQL uses 32-bit unsigned integers for transaction IDs (XIDs), which means they range from 0 to 4,294,967,295.
  • The XID space is circular: when the XID reaches 4,294,967,295, the next XID is 0.
  • For the current transactionId(XID), Postgres uses previous half of the key space as the PAST range and the next half as FUTURE range.


Transaction id(s) on circular space with wraparound

Example with Current XID = 100

Understanding Older and Newer Ranges

1. Current XID: 100

2. Halfway Point: 2^31 = 2,147,483,648


Step-by-Step Calculation

Given current_xid = 100:

Older Range Calculation

Older range starts from current_xid - 2^31:

  • older_start = (100 - 2,147,483,648 + 4,294,967,296) % 4,294,967,296 (modulo operation to avoid getting -ve range)
  • older_start = 2,147,483,748

Older range ends at current_xid - 1:

  • older_end = 100 - 1
  • older_end = 99

Newer Range Calculation

Newer range starts from current_xid + 1:

  • newer_start = 100 + 1
  • newer_start = 101

Newer range ends at current_xid + 2^31:

  • newer_end = (100 + 2,147,483,648) % 4,294,967,296
  • newer_end = 2,147,483,748

Now since you have got the idea that for the current XID what is considered past/older transaction Ids and what is considered as new range of transaction Ids as we are moving around the circular space of the 32-bit transaction Id. Let's see how the past transaction Ids in the tuples are marked FROZEN so that they are available for future use again.

- In PostgreSQL a special reserved transaction Id (2) is used as a frozen_txid. When we need to mark the older rows as frozen, the x_min of the record is updated with the frozen_txid (2)

- Marking the record's t_xmin with 2 signifies that this XID is always older than all other txids. Simply put, the frozen txid is always inactive and visible to all the transactions.

Freeze processing

The freeze process is a part of VACUUM. Whenever VACUUM is triggered it scans all table files and goes through all the pages where the records need to he frozen and rewrites the t_xmin of tuples to the frozen id (2), if the existing t_xmin is older than the current txid minus the `vacuum_freeze_min_age` (default is 50 million).

In versions 9.4 or later, the XMIN_FROZEN bit is set to the t_infomask field of tuples rather than rewriting the t_xmin of tuples to the frozen txid.

Freeze processing is done in the following ways:

1. At the time of concurrent VACUUM

2. At the time when certain criteria is met

3. While running manual VACUUM

At the time of concurrent VACUUM

With concurrent VACUUM, pages which has only dead tuples are processed. To make the process run faster, VM (visibility map) of the target table is used. By checking the bit against a page in this mapping, it can identify whether the page has dead tuples or not.

An id called freezeLimit_txid is calculated. Every records having t_xmin less than freezeLimit_txid is freezed.

freezeLimit_txid = (OldestXmin - vacuum_freeze_min_age)

OldestXmin is the oldest transaction among the currently running transactions at the time this freeze processing got triggered.

`vacuum_freeze_min_age` is a configuration parameter and the default value is 50,000,000.

At the time when certain criteria is met

This condition relies on config parameter `vacuum_freeze_table_age`. This parameter sets the age at which the entire table is forcibly vacuumed to ensure tuples are frozen. Default value for vacuum_freeze_table_age is 150 million transactions.

When the age of the oldest non-frozen XID in a table exceeds vacuum_freeze_table_age, PostgreSQL will force a vacuum operation on that table to ensure tuples are frozen. This helps prevent wraparound issues by ensuring old tuples are marked as frozen before they become problematic.

Manual Freezing

You can explicitly run VACUUM FREEZE to freeze all tuples in a table, regardless of their age. When running a manual VACUUM in PostgreSQL, the age of each tuple (or row) is determined by the transaction IDs (XIDs) associated with the tuple and the current transaction. This age is then compared to the vacuum_freeze_min_age parameter to decide if the tuple should be frozen.

VACUUM FREEZE my_table;        

Why can't the XID be converted to 64-bit INT to have a higher range

There are multiple challenges associated with this, it needs a bigger change across the database, may be in future we might see a bigger datatype for XID :

  • Based on above tuple structure, every tuple contains at least 3 transaction Id fields, bumping up the size of the datatype will impact the overall size of the database, as every record will be increased by minimum of (32x3) -> 96 bits or 12 bytes. Not to forget that transaction Ids are also stored for other internal working of database.
  • Having MVCC allows transaction manager to create multiple versions/copies of the same record and need more and more transaction Ids even if the older versions are cleaned up during vacuum (assuming we are upgrading to 64-bit in order to avoid wraparound).

How to avoid issues with transaction Id wraparound

  • We must keep track of the age of your database's oldest transaction ID (XID). We can get this value from pg_class catalog table.
  • Have some monitoring and alert in place for tracing the transaction age and delayed vacuum process.
  • Make your system is getting VACUUMED time to time in correct manner. Adjust config values according to the use-case.
  • Understand the transaction id exhaustion rate of your database and configure the database accordingly.


Case Studies : Outage because of failure of wraparound

https://blog.sentry.io/transaction-id-wraparound-in-postgres/

https://duffel.com/blog/understanding-outage-concurrency-vacuum-postgresql



References :

https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

https://www.timescale.com/blog/how-to-fix-transaction-id-wraparound/

https://www.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql

https://blog.sentry.io/transaction-id-wraparound-in-postgres/

https://duffel.com/blog/understanding-outage-concurrency-vacuum-postgresql

https://habr.com/en/companies/postgrespro/articles/487590/


I'm passionate about database internals, constantly learning and compiling my insights to deepen my understanding. I enjoy sharing my knowledge with others to help them make aware of these concepts.

Subscribe my newsletter and Join me on this journey of uncovering the fascinating world of databases !






要查看或添加评论,请登录

Manish Pokhriyal的更多文章

  • Anomalies in Database Transactions

    Anomalies in Database Transactions

    Isolation levels defines how different transactions which are running concurrently interact with each other in the…

    2 条评论
  • Comparison of B-Tree based Relational vs Graph database

    Comparison of B-Tree based Relational vs Graph database

    In the world of data management, databases play a crucial role in storing, retrieving, and managing data efficiently…

  • Page - A basic unit of data transfer in databases

    Page - A basic unit of data transfer in databases

    In my recent article we discussed about how the data is organised on disk at higer level where we saw how the data is…

    1 条评论
  • How data is organized on Disk in database systems

    How data is organized on Disk in database systems

    One of the most important job of a database system is to store data and also allow quick access to the data stored in…

    1 条评论
  • Understanding Memory allocation in Java

    Understanding Memory allocation in Java

    In this article, we will explore the fundamentals of memory allocation in Java programs. We'll demystify the different…

    2 条评论

社区洞察

其他会员也浏览了