登录查看更多内容

How to optimize the SELECT queries?

Steve Loc

??Java Developer | Database | Fullstack

发布日期: 2024年5月8日

+ 关注

Optimize SELECT SQL queries by understanding the query execution plan

How:

Firstly, you must be understand the EXECUTION PLAIN. All modern databases, like MySQL and PostgreSQL, define an optimal query execution plan based on the cardinality of the various tables involved and the auxiliary data structures available like indexes or partitions. Both MySQL and PostgreSQL provide a command called EXPLAIN to show the execution plan of a statement. From the execution plan, you can understand how tables are joined, if an index is used, if a partition is pruned and many other aspects of the query execution that could alter the performance. The query plan gives hints about the cost of each operation and can flag if an index is not being used.

To get the execution plan of a query, prefix the query with EXPLAIN like the following:

EXPLAIN SELECT id
FROM orders
WHERE
order_timestamp between '2024-02-01 00:00:00' and '2024-02-03 00:00:00' 
OR status = 'NEW';

The database returns the plan showcasing, in this example, the usage of two indexes idx_order_date and idx_order_status and a BitmapOr between the two results.

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on orders  (cost=687.35..7149.18 rows=60333 width=4)
   Recheck Cond: (((order_timestamp >= '2024-02-01 00:00:00'::timestamp without time zone) AND (order_timestamp <= '2024-02-03 00:00:00'::timestamp without time zone)) OR (status = 'NEW'::text))
   ->  BitmapOr  (cost=687.35..687.35 rows=60333 width=0)
         ->  Bitmap Index Scan on idx_order_date  (cost=0.00..655.75 rows=60333 width=0)
               Index Cond: ((order_timestamp >= '2024-02-01 00:00:00'::timestamp without time zone) AND (order_timestamp <= '2024-02-03 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on idx_order_status  (cost=0.00..1.43 rows=1 width=0)
               Index Cond: (status = 'NEW'::text)
(7 rows)

Warning:

The database optimizer will generate an execution plan based on the cardinality estimates. The closer these estimates are to the data in the table, the better the database will be able to define the optimal plan. A plan can also change over time, therefore, when performance of a query suddenly degrades, a change in the plan could be a possible cause. The plan will also depend on the table and additional supporting structures like indexes being created in the database, in the following section we'll analyze how additional structures can impact the performance.

Golden rule:

Spend time understanding the query execution plan to find potential performance bottlenecks. Update the statistics, automatically or manually using the ANALYZE command, to provide the database up-to-date information about the various tables load.

Optimize SELECT SQL queries by using indexes

How:

Several of the options defined above suggested the usage of indexes, but how should you use them? Indexes are a key performance booster when performing filtering, joining and ordering. Therefore it’s crucial to understand the query patterns and create proper indexes that cover the correct clauses. Both PostgreSQL and MySQL offer a variety of index types, each one having peculiar characteristics and being well suited for some use-cases.

Warning:

As mentioned in the DELETE and INSERT sections, any index added on a table slows down the write operations. Moreover, indexes occupy space on disk and require maintenance. Therefore think carefully about which workloads you want to optimize and what is the set of indexes that could give you the best results.

Golden rule:

Unlike database tables, indexes can be dropped and recreated without losing data. Therefore it’s important to periodically evaluate the set of indexes and their status. To check indexes usage, you can rely on the database system tables, like PostgreSQL pg_stat_user_indexes or the MySQL table_io_waits_summary_by_index_usage, providing up-to-date statistics about queries impacting the indexes. Once you identify used and unused indexes, evaluate the need of restructuring them in cases of workload change.

Pro tip:

Indexes can be used on single columns, multiple columns or even functions. If you’re looking to filter data using, for example, an upper(name) function, you can index the output of the function for better performance.

Optimize SELECT SQL queries by selecting inner joins

How:

Both MySQL and PostgreSQL offer a variety of join types allowing you to precisely define the set of rows to retrieve from both sides of the join. All of them are useful for one or another reason but not all of them have the same performance. The INNER JOIN retrieving only the rows contained on both sides of the dataset has optimal performance. The LEFT, RIGHT, and OUTER joins on the other side, need to perform some additional work compared to the INNER JOIN therefore should be used only if really necessary.

Warning:

Double check your queries, sometimes a query like the following seems legit:

SELECT * FROM ORDERS 
LEFT JOIN USERS ON ORDERS.NAME = USERS.NAME
WHERE USERS.NAME IS NOT NULL

The above is using a LEFT JOIN to retrieve all the rows from ORDERS, but then is filtering for rows having USERS.NAME IS NOT NULL. Therefore is equivalent to an INNER JOIN.

领英推荐

Inner processing of SQL Queries in My Sql Database

?? Saral Saxena ?????? 2 年前

SQL Server Execution Plan: A Hands-On Introduction

Umang Ahuja 4 周前

Stored Procedure in Snowflake using SQL — Aamir P

AAMIR P 1 年前

Golden rule:

Evaluate the exact requirements for the JOIN statement, and analyze the existing WHERE condition. If not strictly necessary, prefer an INNER JOIN.

Pro Tip:

Check also if you can avoid a join altogether. If, for example, we are joining the data only to verify the presence of a row in another table, a subquery using EXISTS might be way faster than a join.

Optimize SELECT SQL queries by using the same column type for joins

How:

When joining two tables, ensure that the columns in the join condition are of the same type. Joining an integer Id column in one table with another customerId column defined as VARCHAR in another table will force the database to convert each Id to a string before comparing the results, slowing down the performance.

Warning:

You can’t change the source field type at query time, but you can expose the data type inconsistency problem and fix it in the database table. When analyzing if the CustomerId field can be migrated from VARCHAR to INT, check that all the values in the column are integers indeed. If some of the values are not integers, you have a potential data quality problem.

Pro Tip:

When in doubt, prefer more compact representations for your joining keys. If what you’re storing can be unambiguously defined as a number (e.g. a product code like 1234-678-234) prefer the number representation since it will:

Use less disk
Be faster to retrieve
Be faster to join since integer comparison is quicker than the string version

However, beware of things that look like numbers but don't quite behave like them - for instance, telephone numbers like 015555555 where the leading zero is significant.

Optimize SELECT SQL queries by avoiding functions in joins

How:

Similarly to the previous section, avoid unnecessary function usage in joins. Functions can prevent the database from using performance optimizations like leveraging indexes. Just think about the following query:

SELECT * FROM users 
JOIN orders ON UPPER(users.user_name) = orders.user_name

The above uses a function to transform the user_name field to upper case. However this could be a signal of poor data quality (and a missing foreign key) in the orders table that should be solved.

Warning:

Queries like the one above showcase a data quality problem solved at query time which is only a short term solution. Proper handling of data types and quality constraints should be a priority when designing data backend systems.

Golden rule:

In a relational database, the joins between tables should be doable using the keys and foreign keys without any additional functions. If you find yourself needing to use a function, fix the data quality problem in the tables. In some edge cases using a function in conjunction with an index could help to speed up the comparison between complex or lengthy data types. For example, checking the equality between two long strings could potentially be accelerated by comparing initially only the first 50 characters, using the joining function UPPER(SUBSTR(users.user_name, 1, 50)) and an index on the same function.

Optimize SELECT SQL queries by avoiding joins

How:

Queries can be built over time by different people and have a lot of sequential steps in the shape of CTE (common table expression). Therefore it might be difficult to understand the actual needs in terms of data inputs and outputs. Most of the time, when writing a query, you can add an extra field “just in case is necessary” at a later stage. However this could have tremendous effects on performances if the field is coming from a new table requiring a join. Always evaluate the strict data needs of the query and include only the columns and the tables which contain this information.

Warning:

Double check if the join is needed to filter rows existing in both tables. In the example above, we could end up with incorrect results if there are user_id present in the orders table that are not stored in the id column of the users table.

Golden rule:

Remove unnecessary joins. It is far more performant to generate a slimmer query to retrieve the overall dataset and then perform a lookup for more information only when necessary. Pro Tip The example explained above is just one case of JOIN overuse. Another example is when we are joining the data only to verify the presence of a row in another table. In such cases a subquery using EXISTS might be way faster than a join.

要查看或添加评论，请登录

Steve Loc的更多文章

Understanding and Implementing BRIN Indexes in PostgreSQL

2024年11月28日

Understanding and Implementing BRIN Indexes in PostgreSQL

1. Introduction to BRIN Indexes BRIN (Block Range Index) is a built-in index type in PostgreSQL designed for handling…
Best Practices and Pitfalls in Java Exception Handling

2024年11月6日

Best Practices and Pitfalls in Java Exception Handling

In Java programming, exception handling is essential for creating robust and error-resilient applications. By…
Introduce about Table-Level Locks in PostgreSQL

2024年10月9日

Introduce about Table-Level Locks in PostgreSQL

PostgreSQL employs diverse lock modes to manage concurrent data access in tables, complementing Multi-Version…
4 SQL Phenomena and How to avoid it in Postgresql

2024年10月8日

4 SQL Phenomena and How to avoid it in Postgresql

Explanation of Four SQL Phenomena When dealing with concurrent transactions in a relational database, certain issues…
PostgreSQL Temporary Table

2024年10月5日

PostgreSQL Temporary Table

Summary: in this tutorial, you will learn about the PostgreSQL temporary table and how to manage it effectively…

3 条评论
Parallel Processing in PostgreSQL: Setup, How It Works, and Use Cases

2024年10月2日

Parallel Processing in PostgreSQL: Setup, How It Works, and Use Cases

PostgreSQL’s parallel processing allows queries to run faster by distributing work across multiple CPU cores. This…
Java Collections Interview Questions

2024年9月29日

Java Collections Interview Questions

1. List versus Set in Collection framework? List have elements of same type entered in ordered form and allows…

5 条评论
Avoiding Null Checks in Java with Optional

2024年9月7日

Avoiding Null Checks in Java with Optional

The NullPointerException (NPE) in Java is one of the most common runtime errors that developers face. It occurs when…
What Is Mutable Objects, How It Works In Java

2024年8月15日

What Is Mutable Objects, How It Works In Java

In previous article, I explained about Immutable Objects in Java. Today I will talk about Mutable Objects Let remember…

1 条评论
Immutable Objects in Java

2024年8月13日

Immutable Objects in Java

In Java, understanding the difference between immutable and mutable objects is key to writing effective code. Immutable…

See all articles

How to optimize the SELECT queries?

Steve Loc

??Java Developer | Database | Fullstack

领英推荐

Steve Loc的更多文章

社区洞察

其他会员也浏览了

Weekly SQL Newsletter: SQL Server Deep Dive ft. Pinal Dave + More!

Covering Index nuances: which columns to cover (WHERE, ORDER BY, LIMIT, SELECT)?

SQL Performance Tuning — Fundamental Techniques for Enhanced Query Efficiency

#4 SQL tips for Performance Optimization

The SQL Database Graph Tables You Didn't Know You Needed

Graph Processing in SQL Server 2017 by David Glass

Distributed SQL Tips and Tricks – Aug 17, 2020

Why SQL Skills Are More Valuable Than Ever

SELECT EXISTS * vs. SELECT EXISTS 1: A Deep Dive into Performance and Best Practices

领英推荐

Steve Loc的更多文章

Understanding and Implementing BRIN Indexes in PostgreSQL

Best Practices and Pitfalls in Java Exception Handling

Introduce about Table-Level Locks in PostgreSQL

4 SQL Phenomena and How to avoid it in Postgresql

PostgreSQL Temporary Table

Parallel Processing in PostgreSQL: Setup, How It Works, and Use Cases

Java Collections Interview Questions

Avoiding Null Checks in Java with Optional

What Is Mutable Objects, How It Works In Java

Immutable Objects in Java

社区洞察

其他会员也浏览了

Weekly SQL Newsletter: SQL Server Deep Dive ft. Pinal Dave + More!

Covering Index nuances: which columns to cover (WHERE, ORDER BY, LIMIT, SELECT)?

SQL Performance Tuning — Fundamental Techniques for Enhanced Query Efficiency

#4 SQL tips for Performance Optimization

The SQL Database Graph Tables You Didn't Know You Needed

Graph Processing in SQL Server 2017 by David Glass

Distributed SQL Tips and Tricks – Aug 17, 2020

Why SQL Skills Are More Valuable Than Ever

SELECT EXISTS * vs. SELECT EXISTS 1: A Deep Dive into Performance and Best Practices