Testing DBtune, showing PostgreSQL double buffering, and some thoughts about automated database tuning for SQL databases
Franck Pachot
Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master
DBtune
At PG Day France, I attended Luigi Nardi's workshop on DBtune. I have always been skeptical of automatic tuning methods because they appear to be based on trial and error and I prefer the scientific approach of analyzing and understanding the rationale before making changes.
As a consultant, I didn't randomly adjust parameters until I achieved positive results. This method wouldn't have been beneficial to my clients, and the issue would likely have resurfaced soon, leaving them with a potentially more serious problem. When I thoroughly understand the problem and solution, I am able to provide the client with a reliable system and a sustainable solution and document it to ensure that the problem doesn't reoccur.
However, machine learning functions differently. It can explore more possibilities in a shorter time frame and consider parameters that I would never have considered. It also continuously improves the system and adapts to different workloads, never leaving the customer unsupported. Once you acknowledge that everything can change on your system, and it actually does, as opposed to tuning once with a conservative approach, you can explore more possibilities for diverse workloads.
Let's consider a simple example. The first part is how I installed and configured DBtune, using my one-month free trial. The second part shows some interesting results about optimal memory usage in PostgreSQL and a better understanding of the main resources DBtune is showing: RAM, CPU, IOPS, and Disk.
Configure the lab
I've been running my database on Oracle Linux 8, with Python 3.8 and to use the DBtune client I installed Python 3.11
sudo dnf update -y
yum groupinstall 'Development Tools'
dnf install -y python3.11 python3.11-pip gcc python3.11-dev I el
I installed PostgreSQL 16 keeping all default settings
sudo dnf module enable -y postgresql:16
sudo dnf install postgresql-server postgresql-devel postgresql-contrib
I started a database with pg_stat_statements extension installed (required by DBtune to measure the SQL execution throughput and response time)
initdb -D /var/lib/pgsql/data
echo "shared_preload_libraries = 'pg_stat_statements'" >> /var/lib/pgsql/data/postgresql.conf
pg_ctl start -D /var/lib/pgsql/data
psql <<'SQL'
create extension if not exists pg_stat_statements;
create database ybio;
SQL
I initialized ybio which I used to run a simple workload with logical reads (when all the working set fits in memory) or physical reads.
git clone https://github.com/FranckPachot/ybio
cd ybio
psql -f ybio.sql
I created a 24GB table
psql -c "
call setup(tab_prefix=>'bench'
,tab_num=>1
,tab_rows=>5e8::int
,batch_size=>10000
);
" -c "
vacuum analyze bench0001;
" -c "
select pg_size_pretty(pg_table_size('bench0001'));
"
The table is 24GB and the primary key index is 19GB - my server has 22GB RAM:
I'm ready to run the runit() function of YBIO to add some load to my database. With tab_rows=>5e8, I'll experience physical reads as it reads randomly from the 24GB working set, but running with tab_rows=>1e8 will do logical reads as it can fit in physical memory, even two times with double buffering.
First, I setup Dbtune for this database.
I get DBtune client (though app.dbtune.com) - the full URL is provided by the web console
sudo su -
wget -O dbtune_client.tar "https://dbtune-eu-client-package-prod.s3.amazonaws.com/clients/61...........................
tar -xf dbtune_client.tar
cd dbtune_client
I'm not using systemctl in this lab so I changed the re-start method, and also configured the connection string:
cat > dbtune_config.yml <<'YML'
postgresql:
connection_string: user=postgres dbname=ybio
restart_command: su postgres <<<"pg_ctl restart -D /var/lib/pgsql/data"
YML
I'm ready to start the client, but before, let's run my SQL load. I'll call the YBIO runit() function from psql. I run single iterations (run_duration=>interval '0 seconds') to get short response time, and run it several times with \watch. As DBtune wil restart the database, I enclose it in a loop.
Logical reads from one session
I run a read-only workload with a 5GB working set that could fit multiple times in RAM:
while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
,tab_num=>1
,tab_rows=>1e8::int
,batch_size=>1000000
,run_duration=>interval '0 seconds'
,pct_update=>0
);
\watch i=0.001 c=100
SQL
done | tee runit.log
I start the DBtune client and the tuning session
sudo python3.11 dbtune.py
Here is the DBtune session's result:
DBtune has restarted the system multiple times, using different parameters each time, and has finally identified the best settings for improving response time. The response time was improved by a factor of 2.6x when the system used 50% of the available RAM (calculated as total memory minus available memory), with 7.2 GB for shared_buffers, 24 MB for work_mem, and 2 parallel workers. Interestingly, during the last run, the response time remained acceptable at 1200ms (instead of the previous 1000ms) with considerably less memory usage: 1.1 GB for shared_buffers, 4KB for work_mem, and 4 parallel workers.
As a human DBA, I prefer using less memory as long as the response time is acceptable so that an unexpected peak of activity does not saturate the system. PostgreSQL does not use direct I/O and must have a large amount of available memory because it is used for the filesystem cache. This available memory can decrease when there are more connections, or execution that require large work areas. However, as a machine, DBtune detected that there was only one session running. This session was reading a large number of rows and then maximizing the usage of the system memory for that session. From what I've seen, I even suspect that it never tries to allocate shared buffers more than 50% of the physical memory, according to the PostgreSQL expert's recommendations.
The two perspectives are both valid. From a human perspective, database monitoring is not constant, and it takes time to analyze issues and find solutions. It is undesirable to have a database system suddenly run out of available memory and be called to troubleshoot during the night, potentially needing to restart the database. On the other hand, DBtune can quickly detect changes in workload and adjust settings accordingly, and when using it, you make your application resilient to restarts. In this scenario, it is appropriate to be more aggressive in allocating memory and parallel workers when only one session is active and to reduce these resources when multiple sessions are detected.
DBtune adjusts the random_page_cost and seq_page_cost parameters. I found that the best response time for this workload occurred when these values were similar, leading to a preference for Parallel Index Scan over Parallel Bitmap Heap Scan. I've warned in the past about setting random_page_cost close to seq_page_cost but I'm in a special case where the working set fits in memory, so reading pages sequentially or randomly doesn't matter.
As a human DBA, I generally prefer stable execution plans that use fewer resources. However, in the case of this single-session workload, DBtune favors changing plans using more ressources. The low response time corresponds to the highest CPU usage:
领英推荐
Logical reads from many session
The run just before the one selected as the best one uses a bit more memory but much less CPU. It has only one parallel worker instead of two. Let's run the same but with ten sessions in parallel
for I in {1..10}
do
while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
,tab_num=>1
,tab_rows=>1e8::int
,batch_size=>1000000
,run_duration=>interval '0 seconds'
,pct_update=>0
);
\watch i=0.001 c=100
SQL
done &
done
The response time is higher and the best settings have higher memory settings, with 11 GB shared_buffers, 83M work_mem, but no parallel workers, finally still using 50% of the RAM
It makes sense not to use parallel queries, as there are many sessions. A parallel query is like asking for all server resources when you know that you are alone.
What is more problematic is the constantly saturated CPU usage:
There are no magic parameters for that. I'm running 10 sessions on a 4 vCPU machine, and these sessions are always utilizing CPU because the working set of 5GB fits within the available memory, which is 50% of 22GB. PostgreSQL does not have a resource manager and allows all connections to run on the CPU (this is why YugabyteDB has implemented a connection manager that does some database-side connection pooling).
Buffered reads from one session
I run a read-only workload with a working set that is two times larger. With 10GB plus indexes, it can fit in memory but only once and PostgreSQL does double buffering because it lacks Direct I/O.
I ran the same single session that I had at the beginning but with a working set of 2e8 rows instead of 1e8 and, because I expect slower response time, a batch size of ten thousand rows instead of one million.
while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
,tab_num=>1
,tab_rows=>2e8::int
,batch_size=>10000
,run_duration=>interval '0 seconds'
,pct_update=>0
);
\watch i=0.001 c=100
SQL
done | tee runit.log
Response time and RAM usage
The response time coincides with low RAM usage:
This is not a surprise when you know that PostgreSQL does buffered reads (as opposed to Direct I/O). The used memory is the PostgreSQL shared buffers. The remaining available memory is not free but is used by the kernel for file buffers. It is important not to oversize the shared buffers (and work_mem) and to keep enough available memory to reduce the number of physical reads.
There's an interesting pattern when 50% of RAM is allocated to shared_buffers: the used memory grows with a low response time, and then the response time jumps before getting back to normal. Linux is lazy for memory allocation and when you re-start the instance with a shared_buffers setting, it is not allocated immediately until the pages are used. This explains the growing usage in the first minutes. In this scenario, shared_buffer start with a cold cache but the kernel cache is warm from the previous executions, and this is why the response time is low. When the used shared_buffers are growing, reducing the available memory, the kernel cache decreases, and there's a point where physical reads happen and response time increases before the whole data set is in the shared buffers and doesn't do logical reads to the kernel.
Response time and IOPS
Looking at reads, the best response time coincides with the maximum IOPS:
Again, it is not a surprise when we know what is measured. Those are not physical reads but are served by the filesystem cache from the available memory. My working set fits in available memory, so those IOPS are all cache hits, except when the working set doesn't fit in one of the two caches, and this is where response time increases.
Response time and CPU
Because there are no physical reads, CPU usage is 100%:
This CPU usage includes not only the CPU used by PostgreSQL ('user' CPU) but also the CPU used by the Linux kernel ('system' CPU) when the kernel reads from its filesystem cache.
Physical reads from one session
My last test runs on a working set that is larger than the available memory: the whole 5e8 rows table, which is 24GB. As the reads are random, they will have to hit the disk. This is where sizing the PostgreSQL shared buffers and leaving enough available memory for the filesystem buffers is tricky. Note that the workload is read-only, and it can be even more difficult with writes that involve checkpoints.
In the previous runs I've observed that the default 5 minutes runs of DBtune are not sufficient to warm up the cache, so I've added the following to dbtune_config.yml, multiplying the defaults by ten:
tuning_session:
warmup_time: 1200
measurement_time: 1800
I run this from one session:
while true ; do
psql -e <<'SQL'
call runit(tab_prefix=>'bench'
,tab_num=>1
,tab_rows=>5e8::int
,batch_size=>10000
,run_duration=>interval '0 seconds'
,pct_update=>0
);
\watch i=0.001 c=100
SQL
done | tee runit.log
Again, the best response time is when the memory used by shared_buffers is low:
Should shared_buffers be set to 1GB on a 22GB system? It makes sense to do so when the working set cannot fit in memory. Keeping the index branches in the PostgreSQL shared buffers and having a portion of other pages in the filesystem cache can be helpful. While you may consider reducing the size of the instance to cut costs, remember that unused memory still helps reduce physical I/O. Running with low RAM usage is what guarantees a more predictable response time.
Sizing PostgreSQL parameters, especially memory allocation, isn't easy, and many databases use very low defaults. Using DBtune can help find the right settings by replicating a production workload. You can then decide to keep these settings, be conservative, or run DBtune regularly to adapt to changing workloads. This may be particularly useful for SaaS providers where activity is unpredictable, or for applications for which you have good load tests that reproduce the production workload at peaks.
Automatic tuning doesn't mean that you don't have to look at it and understand it. Machine learning can do the boring stuff, and open new eyes on parameters you would not have considered in a manual process, but understanding the workload and how the system works always helps.
CoFounder & CTO at Prometey Lab
8 个月left social media for 8 years and was surprised to see Franck working with postgresql ..
Database Consultant & Administrator | PostgreSQL | MSSQL | Oracle | Cloud Admin | Migration | Automation
8 个月Impressive I will try it out.
Odoo PostgreSQL Perf. Tuning Specialist
9 个月I'm glad to know that there is automatic tuning for Postgres besides Ottertune. Thank you for sharing.
Principal research engineer at Huawei
9 个月Very, very interesting read, thank you for sharing.
Brilliant!