登录查看更多内容

PGSQL Phriday #010: My Go-To Solution for Postgres Issues - pgBadger

??Alicja Kucharczyk

Postgres enthusiast

发布日期: 2023年7月7日

Welcome to another installment of the 'PGSQL Phriday' challenge! Today, we'll be diving deep into the fantastic world of pgBadger, a powerful tool that has transformed my approach to Postgres troubleshooting.

My introduction to pgBadger happened quite casually at my very first IT job, which was at a bank in Poland. Every time one of our Postgres instances threw a tantrum (which, by the way, you could totally tell by the flurry of nervous activity in our massive open office), we'd whip up a pgBadger report using the logs from the troubled instance. This handy tool quickly became one of my go-to problem-solvers, and I've carried it with me through every job and company since.

This was particularly true when I embarked on my consulting journey, working with businesses using both on-prem servers and operating in cloud environments. Among the initial things I'd request were:

System views snapshots, gathered in a following way:

psql -Atc "SELECT now(), * FROM pg_stat_bgwriter" >> pg_stat_bgwriter.csv, the same for pg_stat_user_tables, pg_stat_activity etc.

2. OS level metrics, such as CPU, memory, IOPS usage, and more.

3. Basic instance data: version, SKU (in cloud-speak, a simple count of CPUs, memory, and storage), databases and object sizes, parameter settings (I typically use the output of the SHOW ALL command –since I'm too lazy to recalculate units).

4. And of course, the pgBadger.

Armed with these four components, I can efficiently troubleshoot 90% of the cases that come my way.

Diving into pgBadger

This blog post is specifically about pgBadger, so let's dive into the sections of the report that have proven most helpful in a majority of my cases, starting with the ones that pique my interest the most.

VACUUMS Tab

As the name indicates you will find here information about autovacuum actions, both ANALYZE and VACUUM ??Manually executed VACUUMs can be found in the “Top” tab.

My usual approach is to zoom straight into the "Vacuums per table" chart and spot any unusual behavior. If a single table dominates the pie chart (as shown in the picture below), I proceed to the "Tuples removed per table" section and switch to the table view.

No alt text provided for this image — from a sample report: https://pgbadger.darold.net/examples/sample.html#vacuums-per-table

From the figures on the table, I can quickly calculate the average number of tuples (rows) removed in each round. For instance, in the case of the nightly.public.gpu table, it would be 33452 divided by 67, equating to approximately 500 tuples. Not too shabby, right?

However, it's unfortunate to often see instances where a single table has undergone thousands of VACUUM operations per day, with only a few rows being removed each time, while other tables are left untouched.

In such situations, it's essential to inspect the size and the transactions performed on this heavily vacuumed table. More often than not, you'll find it's a relatively small table that sees frequent updates, leading autovacuum to prioritize it. But given its small size - typically just a few MBs - there's no real necessity for such frequent vacuuming.

领英推荐

How to deal with a not-so-interesting project at work?

Arpit Bhayani 10 个月前

Table of Contents for my postings 1

Walter Lee 6 年前

SRE/Devops/Sysadmin newsletter : 2024/03

Xavier Pestel 1 年前

At this point, you can tweak the autovacuum - vacuum settings specifically for this table as follows:

ALTER TABLE gpu SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE gpu SET (autovacuum_vacuum_threshold = 10000);

Having made these changes, you then take a step back for about a week and revisit the charts to see if the table is getting the break it deserves. Also, be on the lookout for any other tables that might have risen to the top of the list.

Temp Files Tab

Just as before, we'd scroll down to the "Queries generating the most temporary files (N)" list, paying particular attention to the average sizes at the top. Suppose you come across something like this in a sample report:

If most of the queries are generating temporary files around 12MB in size, it's a no-brainer to adjust your work_mem to approximately 20MB (just enough to be slightly larger than the temp file size). I should caution you though: remember to consider your total memory and the number of queries that can simultaneously utilize work_mem. That said, in this day and age, 20MB per operation shouldn't pose much of an issue, right? (Of course, when setting work_mem to higher values, a bit more caution is advised).

If you want to check if the value is the right one, there's a way to quickly verify if 20MB will do the trick. Simply set the value in the session first:

SET work_mem = '20MB';

Then, run the top query with EXPLAIN ANALYZE. If you see "quicksort Memory" in the output, you're good to go!

"Top" Tab

The "Top" tab, of course! Simply by glancing at the "Time consuming queries (N)" table, you can resolve a good deal of customer issues. I won't dive into the nitty-gritty of query tuning here, as this blog post is already long enough, and there are plenty of excellent resources available on the topic. However, this section more or less serves as a visual counterpart to the pg_stat_statements extension, wouldn't you agree?

"Events" Tab

This is where you want to be vigilant about any errors, fatal messages, or warnings that your instance might be producing. Here, you can quickly view these issues in a cleanly aggregated format, saving you from trawling through often massive text files. This tab also gives you visibility into locks, application issues, and even nefarious activities like attempted breaches.

And with that, I'd like to extend a heartfelt thanks to Gilles Darold for creating such an extraordinary tool. pgBadger has been instrumental in helping both me and my clients resolve numerous outages and fine-tune our databases.

Tomasz Micha?ek-Czerepak

Cz?onek zarz?du zarz?dzaj?cy departamentem IT (CIO) w Soonly Finance, Head Of Information Technology Department at Soonly Finance

1 年

Great article Ala!

1 次回应

要查看或添加评论，请登录

??Alicja Kucharczyk的更多文章

Do you still need pgBadger if you’re using Grafana?

2024年8月26日

Do you still need pgBadger if you’re using Grafana?

The short answer is yes - you still do! While Grafana is a powerful tool for monitoring PostgreSQL databases, it…

7 条评论
How many HugePages for Shared Buffers?

2022年3月16日

How many HugePages for Shared Buffers?

Running Postgres on huge pages has many advantages, probably the biggest one is decreased CPU utilisation. I wanted to…

1 条评论
Oracle to PostgreSQL migration with ora2pg, Part I

2020年7月5日

Oracle to PostgreSQL migration with ora2pg, Part I

From postgresql.org: PostgreSQL has earned a strong reputation for its proven architecture, reliability, data…

1 条评论

PGSQL Phriday #010: My Go-To Solution for Postgres Issues - pgBadger

??Alicja Kucharczyk

Postgres enthusiast

VACUUMS Tab

领英推荐

Temp Files Tab

"Top" Tab

"Events" Tab

??Alicja Kucharczyk的更多文章

社区洞察

其他会员也浏览了

ELK Stack & Zabbix – B?lüm 3

eCHO News 50

FLaNK Weekly 25 December 2023

FLaNK Stack Weekly for 30 October 2023

FLiPN-FLaNK Stack Weekly for 6 March 2023

FLaNK Stack Weekly for 27 Nov 2023

FLaNK Stack Weekly for Nov 6 2023

Apache airflow installation using Docker

Building a Real-Time Event-Driven Application with Node.js and Apache Kafka on Heroku.

OSAI tl;dr 2nd ed — This news just handed to us

VACUUMS Tab

领英推荐

Temp Files Tab

"Top" Tab

"Events" Tab

??Alicja Kucharczyk的更多文章

Do you still need pgBadger if you’re using Grafana?

How many HugePages for Shared Buffers?

Oracle to PostgreSQL migration with ora2pg, Part I

社区洞察

其他会员也浏览了

ELK Stack & Zabbix – B?lüm 3

eCHO News 50

FLaNK Weekly 25 December 2023

FLaNK Stack Weekly for 30 October 2023

FLiPN-FLaNK Stack Weekly for 6 March 2023

FLaNK Stack Weekly for 27 Nov 2023

FLaNK Stack Weekly for Nov 6 2023

Apache airflow installation using Docker

Building a Real-Time Event-Driven Application with Node.js and Apache Kafka on Heroku.

OSAI tl;dr 2nd ed — This news just handed to us