登录查看更多内容

Testing DBtune, showing PostgreSQL double buffering, and some thoughts about automated database tuning for SQL databases

Franck Pachot

Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master

发布日期: 2024年6月17日

DBtune

At PG Day France, I attended Luigi Nardi's workshop on DBtune. I have always been skeptical of automatic tuning methods because they appear to be based on trial and error and I prefer the scientific approach of analyzing and understanding the rationale before making changes.

As a consultant, I didn't randomly adjust parameters until I achieved positive results. This method wouldn't have been beneficial to my clients, and the issue would likely have resurfaced soon, leaving them with a potentially more serious problem. When I thoroughly understand the problem and solution, I am able to provide the client with a reliable system and a sustainable solution and document it to ensure that the problem doesn't reoccur.

However, machine learning functions differently. It can explore more possibilities in a shorter time frame and consider parameters that I would never have considered. It also continuously improves the system and adapts to different workloads, never leaving the customer unsupported. Once you acknowledge that everything can change on your system, and it actually does, as opposed to tuning once with a conservative approach, you can explore more possibilities for diverse workloads.

Let's consider a simple example. The first part is how I installed and configured DBtune, using my one-month free trial. The second part shows some interesting results about optimal memory usage in PostgreSQL and a better understanding of the main resources DBtune is showing: RAM, CPU, IOPS, and Disk.

Configure the lab

I've been running my database on Oracle Linux 8, with Python 3.8 and to use the DBtune client I installed Python 3.11

sudo dnf update -y
yum groupinstall 'Development Tools'
dnf install -y python3.11 python3.11-pip gcc python3.11-dev I el

I installed PostgreSQL 16 keeping all default settings

sudo dnf module enable -y postgresql:16
sudo dnf install postgresql-server postgresql-devel postgresql-contrib

I started a database with pg_stat_statements extension installed (required by DBtune to measure the SQL execution throughput and response time)

initdb -D /var/lib/pgsql/data
echo "shared_preload_libraries = 'pg_stat_statements'" >> /var/lib/pgsql/data/postgresql.conf

pg_ctl start -D /var/lib/pgsql/data
psql <<'SQL'
 create extension if not exists pg_stat_statements;
 create database ybio;
SQL

I initialized ybio which I used to run a simple workload with logical reads (when all the working set fits in memory) or physical reads.

git clone https://github.com/FranckPachot/ybio
cd ybio
psql -f ybio.sql

I created a 24GB table

psql -c "
call setup(tab_prefix=>'bench'
 ,tab_num=>1
 ,tab_rows=>5e8::int
 ,batch_size=>10000
);
" -c "
vacuum analyze bench0001;
" -c "
select pg_size_pretty(pg_table_size('bench0001'));
"

The table is 24GB and the primary key index is 19GB - my server has 22GB RAM:

I'm ready to run the runit() function of YBIO to add some load to my database. With tab_rows=>5e8, I'll experience physical reads as it reads randomly from the 24GB working set, but running with tab_rows=>1e8 will do logical reads as it can fit in physical memory, even two times with double buffering.

First, I setup Dbtune for this database.

I get DBtune client (though app.dbtune.com) - the full URL is provided by the web console

sudo su -
wget -O dbtune_client.tar "https://dbtune-eu-client-package-prod.s3.amazonaws.com/clients/61...........................
tar -xf dbtune_client.tar
cd dbtune_client

I'm not using systemctl in this lab so I changed the re-start method, and also configured the connection string:

cat > dbtune_config.yml <<'YML'
postgresql:
  connection_string: user=postgres dbname=ybio
  restart_command: su postgres <<<"pg_ctl restart -D /var/lib/pgsql/data"
YML

I'm ready to start the client, but before, let's run my SQL load. I'll call the YBIO runit() function from psql. I run single iterations (run_duration=>interval '0 seconds') to get short response time, and run it several times with \watch. As DBtune wil restart the database, I enclose it in a loop.

Logical reads from one session

I run a read-only workload with a 5GB working set that could fit multiple times in RAM:

while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
 ,tab_num=>1
 ,tab_rows=>1e8::int
 ,batch_size=>1000000
 ,run_duration=>interval '0 seconds'
 ,pct_update=>0
);
\watch i=0.001 c=100
SQL
done | tee runit.log

I start the DBtune client and the tuning session

sudo python3.11 dbtune.py

Here is the DBtune session's result:

DBtune has restarted the system multiple times, using different parameters each time, and has finally identified the best settings for improving response time. The response time was improved by a factor of 2.6x when the system used 50% of the available RAM (calculated as total memory minus available memory), with 7.2 GB for shared_buffers, 24 MB for work_mem, and 2 parallel workers. Interestingly, during the last run, the response time remained acceptable at 1200ms (instead of the previous 1000ms) with considerably less memory usage: 1.1 GB for shared_buffers, 4KB for work_mem, and 4 parallel workers.

As a human DBA, I prefer using less memory as long as the response time is acceptable so that an unexpected peak of activity does not saturate the system. PostgreSQL does not use direct I/O and must have a large amount of available memory because it is used for the filesystem cache. This available memory can decrease when there are more connections, or execution that require large work areas. However, as a machine, DBtune detected that there was only one session running. This session was reading a large number of rows and then maximizing the usage of the system memory for that session. From what I've seen, I even suspect that it never tries to allocate shared buffers more than 50% of the physical memory, according to the PostgreSQL expert's recommendations.

The two perspectives are both valid. From a human perspective, database monitoring is not constant, and it takes time to analyze issues and find solutions. It is undesirable to have a database system suddenly run out of available memory and be called to troubleshoot during the night, potentially needing to restart the database. On the other hand, DBtune can quickly detect changes in workload and adjust settings accordingly, and when using it, you make your application resilient to restarts. In this scenario, it is appropriate to be more aggressive in allocating memory and parallel workers when only one session is active and to reduce these resources when multiple sessions are detected.

DBtune adjusts the random_page_cost and seq_page_cost parameters. I found that the best response time for this workload occurred when these values were similar, leading to a preference for Parallel Index Scan over Parallel Bitmap Heap Scan. I've warned in the past about setting random_page_cost close to seq_page_cost but I'm in a special case where the working set fits in memory, so reading pages sequentially or randomly doesn't matter.

As a human DBA, I generally prefer stable execution plans that use fewer resources. However, in the case of this single-session workload, DBtune favors changing plans using more ressources. The low response time corresponds to the highest CPU usage:

领英推荐

Your Ultimate Guide to Best SQL Bootcamps of 2024

The Education Magazine 7 个月前

How We Improved Our SQL Parser Speed by 70x

Bytebase - Database CI/CD and Security at Scale 9 个月前

Postgres for Everything IRL

Timescale 9 个月前

Logical reads from many session

The run just before the one selected as the best one uses a bit more memory but much less CPU. It has only one parallel worker instead of two. Let's run the same but with ten sessions in parallel

for I in {1..10}
do
while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
 ,tab_num=>1
 ,tab_rows=>1e8::int
 ,batch_size=>1000000
 ,run_duration=>interval '0 seconds'
 ,pct_update=>0
);
\watch i=0.001 c=100
SQL
done &
done

The response time is higher and the best settings have higher memory settings, with 11 GB shared_buffers, 83M work_mem, but no parallel workers, finally still using 50% of the RAM

It makes sense not to use parallel queries, as there are many sessions. A parallel query is like asking for all server resources when you know that you are alone.

What is more problematic is the constantly saturated CPU usage:

There are no magic parameters for that. I'm running 10 sessions on a 4 vCPU machine, and these sessions are always utilizing CPU because the working set of 5GB fits within the available memory, which is 50% of 22GB. PostgreSQL does not have a resource manager and allows all connections to run on the CPU (this is why YugabyteDB has implemented a connection manager that does some database-side connection pooling).

Buffered reads from one session

I run a read-only workload with a working set that is two times larger. With 10GB plus indexes, it can fit in memory but only once and PostgreSQL does double buffering because it lacks Direct I/O.

I ran the same single session that I had at the beginning but with a working set of 2e8 rows instead of 1e8 and, because I expect slower response time, a batch size of ten thousand rows instead of one million.

while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
 ,tab_num=>1
 ,tab_rows=>2e8::int
 ,batch_size=>10000
 ,run_duration=>interval '0 seconds'
 ,pct_update=>0
);
\watch i=0.001 c=100
SQL
done | tee runit.log

Response time and RAM usage

The response time coincides with low RAM usage:

This is not a surprise when you know that PostgreSQL does buffered reads (as opposed to Direct I/O). The used memory is the PostgreSQL shared buffers. The remaining available memory is not free but is used by the kernel for file buffers. It is important not to oversize the shared buffers (and work_mem) and to keep enough available memory to reduce the number of physical reads.

There's an interesting pattern when 50% of RAM is allocated to shared_buffers: the used memory grows with a low response time, and then the response time jumps before getting back to normal. Linux is lazy for memory allocation and when you re-start the instance with a shared_buffers setting, it is not allocated immediately until the pages are used. This explains the growing usage in the first minutes. In this scenario, shared_buffer start with a cold cache but the kernel cache is warm from the previous executions, and this is why the response time is low. When the used shared_buffers are growing, reducing the available memory, the kernel cache decreases, and there's a point where physical reads happen and response time increases before the whole data set is in the shared buffers and doesn't do logical reads to the kernel.

Response time and IOPS

Looking at reads, the best response time coincides with the maximum IOPS:

Again, it is not a surprise when we know what is measured. Those are not physical reads but are served by the filesystem cache from the available memory. My working set fits in available memory, so those IOPS are all cache hits, except when the working set doesn't fit in one of the two caches, and this is where response time increases.

Response time and CPU

Because there are no physical reads, CPU usage is 100%:

This CPU usage includes not only the CPU used by PostgreSQL ('user' CPU) but also the CPU used by the Linux kernel ('system' CPU) when the kernel reads from its filesystem cache.

Physical reads from one session

My last test runs on a working set that is larger than the available memory: the whole 5e8 rows table, which is 24GB. As the reads are random, they will have to hit the disk. This is where sizing the PostgreSQL shared buffers and leaving enough available memory for the filesystem buffers is tricky. Note that the workload is read-only, and it can be even more difficult with writes that involve checkpoints.

In the previous runs I've observed that the default 5 minutes runs of DBtune are not sufficient to warm up the cache, so I've added the following to dbtune_config.yml, multiplying the defaults by ten:

tuning_session:
  warmup_time: 1200
  measurement_time: 1800

I run this from one session:

while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
 ,tab_num=>1
 ,tab_rows=>5e8::int
 ,batch_size=>10000
 ,run_duration=>interval '0 seconds'
 ,pct_update=>0
);
\watch i=0.001 c=100
SQL
done | tee runit.log

Again, the best response time is when the memory used by shared_buffers is low:

Should shared_buffers be set to 1GB on a 22GB system? It makes sense to do so when the working set cannot fit in memory. Keeping the index branches in the PostgreSQL shared buffers and having a portion of other pages in the filesystem cache can be helpful. While you may consider reducing the size of the instance to cut costs, remember that unused memory still helps reduce physical I/O. Running with low RAM usage is what guarantees a more predictable response time.

Sizing PostgreSQL parameters, especially memory allocation, isn't easy, and many databases use very low defaults. Using DBtune can help find the right settings by replicating a production workload. You can then decide to keep these settings, be conservative, or run DBtune regularly to adapt to changing workloads. This may be particularly useful for SaaS providers where activity is unpredictable, or for applications for which you have good load tests that reproduce the production workload at peaks.

Automatic tuning doesn't mean that you don't have to look at it and understand it. Machine learning can do the boring stuff, and open new eyes on parameters you would not have considered in a manual process, but understanding the workload and how the system works always helps.

Anton Bushmelev

CoFounder & CTO at Prometey Lab

8 个月

left social media for 8 years and was surprised to see Franck working with postgresql ..

Tayyab Fayyaz

8 个月

Impressive I will try it out.

1 次回应

Maizar Rahman

Odoo PostgreSQL Perf. Tuning Specialist

9 个月

I'm glad to know that there is automatic tuning for Postgres besides Ottertune. Thank you for sharing.

1 次回应

Diego Didona, PhD

Principal research engineer at Huawei

9 个月

Very, very interesting read, thank you for sharing.

2 次回应

Kasia Tuszynska

9 个月

Brilliant!

查看更多评论

要查看或添加评论，请登录

Franck Pachot的更多文章

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

2025年3月11日

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

Relational data modeling or document data modeling? With different terms, this question has existed for 50 years of…
2025: I'm joining MongoDB

2025年2月6日

2025: I'm joining MongoDB

I have 30 years of experience with SQL databases, including Oracle Database, Amazon RDS, PostgreSQL, and YugabyteDB. I…

109 条评论
Where is the database schema? #SQL #NoSQL

2025年1月31日

Where is the database schema? #SQL #NoSQL

Although SQL databases can evolve using DDL (Data Definition Language), they are recognized for rigid schemas. In…

8 条评论
SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

2024年11月11日

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

The long-standing debate between SQL and NoSQL used to be framed as a choice between structured and unstructured data…

1 条评论
No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

2024年11月4日

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

YugabyteDB is recognized for its resilience and scalability. The distributed storage was also designed to overcome…

7 条评论
SQL database replication: Logical or Physical?

2024年9月26日

SQL database replication: Logical or Physical?

Traditional SQL databases have incorporated replication after their initial design. This can be achieved by…

2 条评论
Starting with YugabyteDB or MongoDB?

2024年9月12日

Starting with YugabyteDB or MongoDB?

MongoDB has gained popularity among developers due to its user-friendly interface and flexible schema-less design…
CQRS != Read-Only Database Replicas

2024年9月6日

CQRS != Read-Only Database Replicas

Command Query Responsibility Segregation (CQRS) is an important design pattern in microservices architectures. It…

2 条评论
A not-so-good idea: Pipe Syntax In SQL

2024年8月26日

A not-so-good idea: Pipe Syntax In SQL

Many SQL users have expressed frustration with the SQL query syntax for SELECT. They argue that beginning with the FROM…

13 条评论
Separation of compute and storage for YugabyteDB

2024年7月29日

Separation of compute and storage for YugabyteDB

Separating computing instances and persistence service, also known as disaggregation of compute and storage, gives the…

3 条评论

See all articles

Testing DBtune, showing PostgreSQL double buffering, and some thoughts about automated database tuning for SQL databases

Franck Pachot

Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master

DBtune

Configure the lab

Logical reads from one session

领英推荐

Logical reads from many session

Buffered reads from one session

Response time and RAM usage

Response time and IOPS

Response time and CPU

Physical reads from one session

Franck Pachot的更多文章

社区洞察

其他会员也浏览了

Efficiently Managing Employee Records Using Azure SQL and Python

Making PostgreSQL a Better AI Database

Top 8 Free, Open Source SQL Clients to Make Database Management Easier

Building a Data Pipeline with SQL, Python, and Azure Fabric

Journey To Database World: Part 7 (Document Database - MongoDB As Example)

Connecting to MySQL Database from Python in a High-Security, High-Volume Environment...

Create A Flask App To Use PostgreSQL Database

Python Database Connection Tutorial

Hive vs Spark

Python POST requests three ways with Oracle REST Data Services (ORDS)

DBtune

Configure the lab

Logical reads from one session

领英推荐

Logical reads from many session

Buffered reads from one session

Response time and RAM usage

Response time and IOPS

Response time and CPU

Physical reads from one session

Franck Pachot的更多文章

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

2025: I'm joining MongoDB

Where is the database schema? #SQL #NoSQL

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

SQL database replication: Logical or Physical?

Starting with YugabyteDB or MongoDB?

CQRS != Read-Only Database Replicas

A not-so-good idea: Pipe Syntax In SQL

Separation of compute and storage for YugabyteDB

社区洞察

其他会员也浏览了

Efficiently Managing Employee Records Using Azure SQL and Python

Making PostgreSQL a Better AI Database

Top 8 Free, Open Source SQL Clients to Make Database Management Easier

Building a Data Pipeline with SQL, Python, and Azure Fabric

Journey To Database World: Part 7 (Document Database - MongoDB As Example)

Connecting to MySQL Database from Python in a High-Security, High-Volume Environment...

Create A Flask App To Use PostgreSQL Database

Python Database Connection Tutorial

Hive vs Spark

Python POST requests three ways with Oracle REST Data Services (ORDS)