5 tips to improve your PostgreSQL Queries Performance
Luis Mejia ??
Senior Backend Engineer who ?? PMs and QAs - Ruby on Rails | System Design
Something really great about PostgreSQL is its documentation , you can always go there for excellent references.
Nevertheless I would like to share some great resources to improve performance if you would like to improve your PostgreSQL Queries Performance.
1. Explain Command
If we talk about #performance #improvement in #PostgreSQL, the Explain command cannot be missed.
This command displays the execution plan that PostgreSQL generates for the provided query. The most critical part of the display is the estimated statement execution cost, how the tables will be scanned and what JOIN algorithms will be used.
If you find plain sequential scan in a really big table, you may need to consider refactoring the query to use existing indexes and in some scenarios even creating a new index if required.
If [EXPLAIN]?ANALYZE?is used, the query will be executed and the real run time statistics are added to the display.
2. Partial Indexes
Imagine you have a billing system sending 20 million transactions per day, but only 1% of those transactions have a successful status.
How can you quickly access that information?
Partial indexes allow you to create indexes based on a subset of the table, defined by a condition.
In the example above, you can create the partial index and then PostgreSQL planner will use that index whenever you use that condition.
CREATE INDEX ...WHERE status = 'successful';
3. Materialized Views
Now let's talk about views, imagine we created a view by joining HUGE tables, and we did such a good job the view became very popular and is used in many reports.
But querying a huge amount of data in different tables may take many minutes to process.
领英推荐
How could we improve that #performance?
Materialized Views work like Views, but persist the results in a table-like form.
Accessing to data stored in a materialized view is often much faster than using a view to query data directly from tables. However, the data will not be always up to date; yet sometimes current data is not needed.
For instance, If people want to be able to quickly graph historical data, they might want to summarize and may not care about the incomplete data for the current date.
You can always include current data by using REFRESH command for the Materialized View. Usually you can add a periodic task or cronjob in your server to execute REFRESH in a pre-defined time interval.
4. CTE (Common Table Expressions)
Common Table Expressions are queries that use WITH statement. They can be considered as the definition of temporary tables that exist only for a query.
I prefer CTEs over sub-queries, as they are easier to read and has less redundancy. CTE is reusable by design. Instead of having to declare the same subquery each time you need it, you can define CTE at the top of the query once, name it as concise as possible and refer to it whenever you need it on your query.
5. VALUES over ARRAY
Your SQL query can be 100x faster in Postgres by changing only ONE LINE.
You may just need to change
WHERE c.key = ANY (ARRAY[...])
for
ANY (VALUES (15368196))
Can you spot the difference??ARRAY?has been changed to?VALUES.
By using?ARRAY, the query?list all the primary keys to look up while VALUES lets the optimizer make full use of any indexes you may have for the key column.
You can find more details about this example here .
Thanks for reading until the end, and I hope you liked the content and read the references links for more context, don't forget to share and follow me.
?Backend Developer Nodejs | Database Optimization | Software Solution Architect | OWASP | AWS SAA at C2C Techhub?
3 个月great post ??