[Postgres] How to deal with long-running transactions
Nikolay Samokhvalov
?? Let's make *your* Postgres healthy and awesome: [email protected] // I stand with Ukraine ???? // Postgres.AI founder; PostgreSQL contributor; Postgres.FM co-host // Freediver
I post a new PostgreSQL "howto" article every day. Join me in this journey – subscribe here or on X, provide feedback, share!
Why long-running transactions can be a problem
In the OLTP context (e.g., mobile and web apps), long-running transactions are often harmful for two reasons:
"Long-running" is a relative term, and, of course, its meaning depends on particular situation. Usually, in heavily-loaded systems – say ~10^5 TPS including RO queries and ~10^3 of XID-consuming TPS (writes) – we consider transactions running longer than 30-60 seconds to be long. This can be translated to 30-60k dead tuples accumulated in a table in the worst case – in the case when all transactions during that time frame produced 1 dead tuple. Of course, this is a very, very rough assumption, but this can give an idea about the scale and helps define "threshold" to support the meaning of the "long-running transaction" term.
How to prevent long-running transactions
In some cases, we might decide to prevent long-running transactions from happening at a global level, to be protected from the negative effects described above – in this case, we decide that interrupting a long-running transaction, leading to an error sent to one user is better than negative side effects affecting many users.
How to completely prevent long-running transactions from happening? The short answer: using just Postgres settings, you cannot.
As of PG16 / 2023, Postgres doesn't provide a way to limit transaction duration (although there is a patch proposed, implementing transaction_timeout – help test and improve it if you can).
There are two limitation settings that can help reduce chances that a long-running transaction occur, but not eliminating the risks completely:
If both of these options are set to low values, it doesn't fully prevent long-running transactions from happening. For example, if we set both of them to 30s, we might still have a transaction running for hours:
领英推荐
– in this case, neither of the two thresholds are achieved, but we can have a transaction that hours and even days.
While there is no such a setting as transaction_timeout yet, we can consider alternative options to fully prevent long-running transactions from happening:
How to analyze long-running transactions
Getting the list of all long-running transactions is straightforward:
select clock_timestamp() - xact_start, *
from pg_stat_activity
where clock_timestamp() - xact_start > interval '1 minute'
order by clock_timestamp() - xact_start desc;
But for the situation described above – both statement_timeout and idle_in_transaction_session_timeout are very low, and we still have a long-running transaction – we usually want to start sampling the states of the session that has a long-running transaction, to understand what queries it consists of. Without such sampling, we don't have a good source of data (queries are fast, they are usually below log_min_duration_statement), so we don't see them in logs.
In this case, we can apply the method described in #PostgresMarathon Day 11: Ad-hoc monitoring and sample long (> 1min) transactions every 1 second (might be worth increasing the frequency here):
while sleep 1; do
psql -XAtc "
copy (
with samples as (
select
clock_timestamp(),
clock_timestamp() - xact_start as xact_duration,
*
from pg_stat_activity
)
select *
from samples
where xact_duration > interval '1 minute'
order by xact_duration desc
) to stdout delimiter ',' csv
" 2>&1 \
| tee -a long_tx_$(date +%Y%m%d).log.csv
done
Marathon progress: ▓??????????????????? 8.22%
This series is also available in Markdown format: https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos
Engenheiro de Software Senior @ iFood | SpringBoot - Java - Kotlin - AWS
1 年Very Nice, Postgres hook me with the big one relational database after listen a podcast on Spotify -- i lost the link ??. Unfortunately, i can't work with it on any recent project.
Technical Support Analyst
1 年Nice article, congratz!