How I Achieved Over 900x Improvement in SQL Query Performance
Generated by DALL.E in ChatGPT

How I Achieved Over 900x Improvement in SQL Query Performance

I know, it sounds like clickbait, but trust me?—?this isn’t a magic trick. Just some basic yet overlooked optimizations that turned my query speed to hit Mach 10!

Yes, I measure execution speed by the speed of sound. Because if it’s not breaking the sound barrier, is it even optimized?

Plus, I stumbled upon another optimization while switching if-else statements to switch—and it turned out to be a game-changer. But let's start with the SQL optimization first...

CTEs: From 16+ Minutes to Under a?Second

Before I dive into how I rewrote the query to optimize its execution time, let me give you some context about the challenge I was tackling.

In our system, we work with two tables. One contains “object” E, and the other holds related data for that “object,” which we’ll call A.

Table A has over 1.5million rows (this is a lower environment), many of which are duplicates. To process the data, I needed to use a JOIN to combine it with Table E. Since the duplicates were unnecessary for my task, I also had to apply DISTINCT to filter them out.

On top of that, the object ID had to be passed into a user-defined function to fetch additional data.

This was the state of the application when I began working on the optimization.

Query without CTE
Query without CTE

The client code that relies on the stored procedure containing this query began taking significantly longer to return results. Initially, this wasn’t an issue when Table A had only a small amount of data. However, as users started adding more data, the execution time increased rapidly?—?almost exponentially.

Analyzing the?system

The bug ticket was raised with the title: “Feature not working: Facing timeout when trying to access the feature page.”

Now, in our project, we have a culture where developers chime in with their wisdom, ideas, and sometimes unsolicited advice (which is, of course, appreciated!). Two developers had already pointed out that the issue was likely tied to the slow-running stored procedure (SP). This was a blessing because it meant I could skip the whole “misleading ticket title” phase and avoid hours of debugging.

To add to the complexity, our system’s SPs have a tendency to be like Russian nesting dolls, with one calling another, which then calls another. The SP I was working on had already been summoned by two other consecutive SPs. Thanks to the developers’ comments, I was able to narrow down the issue to the exact SP. However, the SP was far from simple?—?it contained over 150 queries, each with multiple branches??.

Identifying the?problem

Just as I was starting to lose hope at the thought of analyzing hundreds of query plans to pinpoint the issue, one particular query using a user-defined function (UDF) caught my attention. I decided to dig deeper into the tables it was referencing?—?Tables A and E.

That’s when I realized that Table A contained duplicates, which weren’t even relevant to the SP’s context. This led me to the obvious solution: rewrite the UDF as a Table-Valued Function (TVF) to get the distinct objects in a result set. The UDF itself didn’t have much logic, and returning a result set would be far more efficient than processing each row individually. It was a perfect example of the classic set-based versus row-by-row operation, and I’d heard plenty of advice on why set-based operations are the way to go.

It was getting close to my sign-off time, and I really didn’t want to leave with a half-baked TVF sitting in my head. My curiosity was eating me up, and I knew I’d be itching to see it in action. But at the same time, I realized that rewriting the UDF, creating test stored procedures, updating the codebase to call the test SPs, setting up data, and then testing it all in the UI was… well, a bit of a mountain of work?—?one that couldn’t possibly be conquered in an hour. ??

So, I needed to find a solution that could actually be done within the time I had left.

That's when I decided to split the query into two blocks: one to retrieve the distinct data first and another to call the UDF. This way, I could avoid calling the UDF for duplicate inputs, which would return the same data anyway.

Baseline performance

To establish a baseline performance, I ran the SP manually and saw it taking 1.5 minutes to return the first 12 rows. Curious, I pushed it further and tried fetching the first 20 rows. To my horror, the query started running for more than 16 minutes. At that point, I had to stop the query. I wasn’t about to be featured as the person with the longest-running query of the last five years?—?and risk having an awkward conversation with the DB admin. ??

Query plan before optimizing
Query plan before optimizing

Implementing the?fix

Then I made the change using a CTE, created test SPs, and ran the SP. Drum roll... Can you guess the execution time after this change??

Well, you’ve seen the title, so yes—it ran in under a second. ??

Query after implementing CTE
Query after implementing CTE

It felt too good to be true, but I went ahead and created test SPs, updated the codebase, set up the data, and tested the entire flow from the UI. When I saw the data rendering in the UI in just a second, I almost cried??.

Query plan after implementing CTE
Query plan after implementing CTE

Comparing the?plans

  1. Efficient Row Reduction with Early Deduplication The original query handled duplicates in Table A late in the process, increasing CPU and memory usage by unnecessarily processing more rows. The CTE approach solved this by applying DISTINCT early, reducing rows passed to subsequent operations, and cutting down redundant work. This early row reduction significantly minimized resource usage.
  2. Simplified Joins and Aggregations The original query’s Hash Match Join was resource-intensive, and aggregation was applied late, increasing the workload. The CTE approach reduced rows before the join, enabling the use of an Adaptive Join for efficiency. By applying GROUP BY or DISTINCT earlier, it minimized the rows processed in subsequent steps.

Retrospective

While the performance was impressive?—?way better than expected?—?using a TVF might have given me even better results since I’m still doing a row-by-row operation in the main query. However, given the time constraints and the number of tickets piling up on my plate, we decided to treat this as a fix and move on. I’m holding onto the hope that one day I’ll get a chance to try out the TVF in this query and see how much more it can improve things.

Optimization 2: Switching to?switch

I switched the IF conditions to a SWITCH not for performance, but because my work laptop screen is small. With 15 conditions, I had to scroll every time to check which report type instance was being triggered.

I was debugging a report package, where each condition generated a different report. In the worst case, I had to go through all 15 conditions to find the one I needed. And let’s be honest, I could’ve executed the code multiple times to check each branch, but who does that? I’m a programmer. I’m always looking for the laziest, most efficient way to make my life easier?—?even if it doesn’t exactly make sense.

So, I converted the 60 lines of IF...else into just 18 lines of SWITCH (brace formatting in C# is still a mystery, though). No more scrolling—just a quick glance at the conditions. Life suddenly became much simpler.

But then something odd happened. When stepping through the code, execution jumped straight to the relevant case without scanning the others. It was weird, but I didn’t think much of it at first. As I tried to fall asleep, though, it bugged me. I tried to convince myself it was just the debugger doing some early evaluation, or maybe the compiler had some internal hash map optimizing the condition checks (which was actually pretty close to what was happening). Still, I couldn’t shake the feeling that there was more to it.

Unable to let it go, I dove into research and discovered why SWITCH is faster than IF statements. Apparently, it all boils down to jump tables, which optimize the execution by reducing the number of condition checks. I was so close to figuring it out myself!

Anyway, I couldn’t just take these articles at face value?—?I needed to see for myself if all the claims were true. So, I decided to dive into the IL of a test?.NET application using ILSpy.

If you’re just here to learn how I optimized the code, feel free to stop reading now. From here on, I’m going to talk about how I verified whether the?.NET compiler was indeed creating a jump table?—?purely out of curiosity.
IL code of if…else conditions and switch statement
IL code of if…else conditions and switch statement

The IL code might look similar to assembly, but don’t worry?—?you don’t need to know all the details. Here’s the key information you need to understand:

  • ldarg.0 —?Load the first method argument (input value) onto the evaluation stack for processing.
  • brtrue.s —?Conditionally branch to a specific label if the value on the stack is non-zero (true).
  • ldc-i4.<any number> —?Load a 4-byte integer constant (e.g., 1, 2, 3) onto the evaluation stack.
  • ret ?—?Return from the method, using the value on top of the stack as the return value.
  • bne.un.s?—?Branch to a specific label if two values on the stack are not equal (unsigned comparison).
  • switch —?Perform a jump to one of several labels based on the integer value of the argument.
  • br.s —?Perform an unconditional jump to a specified label, typically for a default or fallback case.

With this information, you could dry-run the IL code yourself. But here’s the highlight: apart from the GoodSwitch method having fewer instructions overall, it also contains a key IL instruction: the switch keyword.

The Switch Keyword and the Jump?Table

The switch keyword implements a jump table—a data structure that maps values to addresses, enabling direct jumps to specific code locations.

Why a Jump Table is?Better:

  • Faster Lookups: Jump tables provide constant-time lookups, bypassing the need for sequential checks.
  • Improved Performance: By avoiding step-by-step condition checks, the execution is more efficient, especially for large numbers of cases.


This doesn’t mean you should replace all your if-else blocks with switch statements. Even with switch, if the cases aren’t carefully structured, the compiler may fall back to less efficient IL code. In our scenario, the report types were enums with sequentially incremented values, which allowed the compiler to generate an efficient jump table.

Write code for humans to read first; optimize for machines only when it truly matters.

What started as a simple tweak to make debugging easier turned into a performance boost thanks to switch statements and jump tables. Sometimes, small changes lead to big wins—proving that even minor optimizations can make a huge difference.

Stay Curious. Adios ??

This article was originally written on Medium.

要查看或添加评论,请登录

Hiruthic .S.S的更多文章

社区洞察

其他会员也浏览了