Do you still need pgBadger if you’re using Grafana?

Do you still need pgBadger if you’re using Grafana?

The short answer is yes - you still do!

While Grafana is a powerful tool for monitoring PostgreSQL databases, it doesn't entirely replace pgBadger. Some of the data available in pgBadger reports, particularly those drawn from PostgreSQL's text logs (distinct from WAL logs), aren't easily accessible through Grafana alone.

Disclaimer: I'm not a Grafana expert, but I’m aware of newer extensions like Grafana Alloy and Loki that might help bridge this gap. If any of you have experience with these tools, I’d love to hear your thoughts.

Why This Article?

Pankaj R. Borwar asked me a thought-provoking question:

My organization uses Grafana for monitoring and troubleshooting PostgreSQL databases. Given that both pgBadger and Grafana offer graphical interfaces, I’m finding it hard to pinpoint their key differences.

This is a crucial question because many organizations standardize on Grafana for monitoring, and rightly so - Grafana is an excellent tool. However, understanding where each tool excels is vital for effective database management.

Comparing pgBadger and Grafana: Tab by Tab

To understand the differences and overlaps between pgBadger and Grafana, let's walk through the tabs in a typical pgBadger report, comparing what each offers versus what Grafana can do. Referencing the visual layout can be helpful, so here's a screenshot showcasing the various tabs available in pgBadger reports:


For a hands-on look, you can explore a sample pgBadger report yourself. This example provides a comprehensive view of all the available tabs and the detailed information each contains, allowing you to follow along with the comparisons below.

Events Tab

This tab is crucial for monitoring and visibility into any incidents that occur. Standard Grafana setups using PostgreSQL statistics won't capture these events. pgBadger’s ability to aggregate and present these incidents is a significant advantage over plain text logs, saving you hours or even days of manual log parsing.

As I noticed (but haven’t checked myself), there’s a relatively new tool in Grafana called Grafana Alloy that allows you to capture logs too. You can find more information here: Grafana Alloy Logs Capture. There’s also a great blog post showing how to configure Loki for PostgreSQL: Setting up Loki and Promtail.

It seems that the end result will be plain logs displayed on the Grafana dashboard, without the useful aggregations that pgBadger offers, as shown here: Processed Log Lines in Grafana. You can see a screenshot example here:


Example Loki Logs
Example Grafana/Loki logs

So, you’d still need to spend some time making it more useful, for instance, by using pipelines: Grafana Pipelines. This means essentially duplicating the work pgBadger already does, like writing regular expressions to parse the logs.

Top Tab

You can replicate this functionality in Grafana by using the pg_stat_statements extension. However, without log data, you lose the ability to track Process IDs (PIDs), which can be critical for deeper analysis and is described later on in this article.

Queries Tab

This can also be visualized in Grafana, similar to the Top Tab.

Locks Tab

You can track locks in Grafana. For instance, by using the query from this PostgreSQL wiki page, you can see which query is blocking a particular statement. This can actually be more useful than what pgBadger offers, as pgBadger only shows you which statements are locked, not which ones are causing the block.

Vacuums Tab

pgBadger definitely wins here. There's no other way to extract information about how many tuples and pages were removed by specific ANALYZE or VACUUM runs other than from logs, which pgBadger processes efficiently. To understand why this is so important, you can check out this YouTube video and this LinkedIn article.

Temp Files Tab

pgBadger wins here again. Sure, you can check for temp files in the pg_stat_database view using a query like this:

SELECT datname, temp_files, temp_bytes FROM pg_stat_database;        

But this only gives you basic information and doesn’t really help you take action. pgBadger, on the other hand, shows you exactly which query generated the temp files and how large they are. The query above just gives you the overall numbers, maybe letting you calculate an average, but that’s it. It doesn’t let you, for example, refactor a query with unnecessary ORDER BY clauses because you can’t see which query caused the large temp files generation. You could achieve this with pg_stat_statements, but it would require more effort.

Checkpoints Tab

You can monitor checkpoint causes in Grafana, but you'll miss out on the deeper insights that pgBadger can offer.


Checkpoint activity from pgBadger
Checkpoint activity from pgBadger

Sessions and Connections Tab

This can be handled in Grafana without the need for additional extensions.

Why PID Tracking is So Important

In many cases, when PostgreSQL emits an ERROR message, it includes the Process ID (PID) of the process that caused the issue. This PID is crucial for identifying and troubleshooting problems, especially if you have a detailed logging configuration set up (you can find more information about logging levels in the PostgreSQL documentation).

For example, with proper logging and particular errors, you might see additional details that help identify the query running on the particular PID, like in this instance:

2023-09-14 09:35:37.490 CST [155588] LOG: server process (PID 155588) was terminated by signal 11: Segmentation fault 
2023-09-14 09:35:37.490 CST [155588] DETAIL: Failed process was running: merge into XXXX d using( ****** ) s WHEN NOT MATCHED THEN INSERT (******) VALUES (******) WHEN MATCHED AND (******) THEN UPDATE SET ******        

You can see the example shown above here.

However, often the statement is not printed, and you only receive the PID information, like in this example:

server process (PID 26818) was terminated by signal X: Killed/Segmentation fault etc.        

In such situations, the most reasonable thing to do is figure out what was running at that time on that particular PID - 26818 in this example. Logs are the most natural resource to look into (assuming you have properly configured your log_line_prefix and have decent logging configuration). You can quickly search (e.g., using grep) for this PID and solve the mystery. Without this, you would be lost, as Grafana typically does not provide this type of detailed information.

Aggregation Table: pgBadger vs. Grafana


Feature comparison

Conclusions

While Grafana is a powerful tool for monitoring PostgreSQL databases, it doesn’t entirely replace pgBadger. Grafana, especially with extensions like Grafana Alloy and Loki, can replicate some of the functionalities of pgBadger, but often requires significant setup and customization.

pgBadger excels in areas that involve detailed log analysis, such as tracking Events, Vacuums, and Temp Files. These are aspects where Grafana either falls short or requires additional work to achieve similar results.

Grafana can track locks more effectively than pgBadger, offering more detailed insights, such as identifying which query is blocking a statement. However, it still lacks the ability to track PIDs effectively without logs, a crucial aspect for troubleshooting specific database issues.

In summary, while Grafana is an excellent tool for real-time monitoring and visualizing PostgreSQL performance, pgBadger remains indispensable for deep log analysis and understanding specific database behaviors that aren’t easily captured in Grafana alone. Therefore, using both tools in tandem can provide the most comprehensive insights into your PostgreSQL databases.


Pankaj R. Borwar

Database Administrator | AWS | Linux | MongoDB| PostgreSQL

6 个月

??Alicja Kucharczyk Hi, I am thankful for the article you wrote on what i asked, Your insights were incredibly helpful and provided exactly what I needed. I truly appreciate the effort you put into it.

Roman Novikov

Addressing Business Challenges with Technical Solutions

7 个月

??Alicja Kucharczyk, this is an excellent article, but its title is a little confusing, as you are comparing apples and ... bread ) Grafana is a powerful tool that can be used for various purposes. However, it needs to be combined with a data source to utilize its full potential. This data source could be your PostgreSQL server, enabling real-time data monitoring, or it can be used to gather logs, as you mentioned. Typically, the most common approach is to use a time-series storage solution such as VictoriaMetrics or Prometheus. ?This will provide users with historical data and the ability to extend the collected information. And yes, in PMM, we have already combined everything and created Query Analytics for detailed query information.?

Kamil Króliszewski

Knowledge is key to succeed in any endeavor.

7 个月

Interesting article. I didn’t know that pgBadger offers such capabilities. On my end, I recommend Percona Monitoring and Management. It's essentially a very extensive set of dashboards on a highly customized Grafana that provides plenty of monitoring options thanks to the `pg_stat_monitor` extension. You can connect any number of PostgreSQL clusters there. One of the cooler features is QAN (Query Analytics). Additionally, PostgreSQL has supported JSON logging for some time, so basically, any log file can be conveniently sent, for example, to Elasticsearch and then visualized in Grafana as desired.

要查看或添加评论,请登录

??Alicja Kucharczyk的更多文章

社区洞察

其他会员也浏览了