ELI5: Windows Functions
Manutej Mulaveesala
Generative AI Specialist and Educator | Prompt Engineer | AI Strategist | | Writer | Speaker
What are Window Functions?
Navigating the intricate landscape of database management and data analysis can sometimes feel like being an explorer in uncharted territories. There’s a certain thrill in unearthing hidden patterns and drawing meaningful insights from raw data.
At first glance, SQL Window Functions might seem like just another set of commands in the extensive SQL repertoire. However, they hold a secret power, a kind of hidden brush, that turns rows of data into a canvas of possibilities. Today, we set out to demystify these powerful functions, peel back the layers, and reveal the artistry and efficiency they bring to data analysis.?
As we venture through the intricacies of Window Functions, we’ll uncover their ability to not just answer our queries, but to tell a story with our data, to find the rhythm in the rows and the melody in the numbers. Whether you’re a data novice with a curious mind or a seasoned SQL maestro, this guide is crafted to guide your steps through the intricate dance of Window Functions and to open your eyes to the symphony of data that awaits.
When Should I Use Window Functions (ELI5)
Imagine that you have some building blocks, and each building block represents some data. Your task requires you to look at certain groups of blocks or to make new blocks depending on the existing blocks that you have.
1. You Want to Compare Blocks Without Mixing Them?Up
Imagine you want to see if one block is taller than the blocks right next to it. A Window Function lets you look at each block and its neighbors without mixing them all up, so you can easily compare them.
2. You Want to Count or Add Up Blocks in a?Row
If you want to count how many blocks TOTAL you have in a column or add up their numbers, a Window Function can do that for you, looking at each block one by one and keeping a running total. It can help you find a running average of those blocks as well!
3. You Want to Find the Biggest or Smallest Block in a?Section
Let’s say you have your blocks sorted in rows by color, and you want to find the biggest block in each row. A Window Function helps you look at each row separately and pick out the biggest block in each one.
4. You Want to Give Blocks a Score or a?Rank
If you want to give each block a score or a rank based on its size or color, a Window Function can do that too. It looks at all the blocks, sorts them how you want, and then gives each one a number to show its rank in the overall set of blocks.
5. You Want to See How Blocks Compare to Their?Friends
Maybe you want to see if a block is taller than the average height of the blocks around it. A Window Function can look at a block and its buddies, calculate the average height, and then tell you how that block compares.
Window functions are SQL operations that perform a calculation across a set of rows that are related to the current row. Unlike (typical) aggregate functions, they do not cause rows to become grouped into a single output row?—?the rows retain their separate identities. Window functions can perform a calculation across a set of rows that are related to the current row. They are called window functions because they perform a calculation across a “window” of rows. For instance, you might want to calculate a running total of sales or find out the highest score in a group.
See the image below to see how they compare to aggregation functions.?
Anatomy of a Window?Function
A lot of people have very complicated ways to go about explaining Window Functions, so my goal is to make it super simple for you!
Imagine you’re on a sightseeing bus, and you’re looking out the window. You see things one after the other, right? SQL Window Functions work a bit like that. They look at your data row by row, but (and here’s the cool part) they remember what they’ve seen before and what’s coming up next. It’s like having a photographic memory while sightseeing!
Let’s look at this in actual SQL Code:
SELECT column_name,
WINDOW_FUNCTION(column_name) OVER (
PARTITION BY column_name
ORDER BY column_name
RANGE/ROWS BETWEEN ... AND ...
)
FROM table_name;
There you have it! You have a high-level bird’s eye view of Window Functions work. We’ll of course want to look at some basic examples to tie this all together, so we’ll do that next.?
Examples To Make it?Stick
Let’s imagine that we have some simple Sales Data and line items for this sales data.?
1. Running?Totals
SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
SUM(SaleAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM Sales;
This will calculate the running total of SaleAmount for each row ordered by SaleDate. The results are shown below. Notice the new column called RunningTotal! There you have it, we just created a new column! You may have seen this elsewhere as a “calculated field.”
2. Cumulative Totals (By SalesPerson)
Now what if we wanted to see how each member of the Sales team was evolving over time? It is quite important to keep track of numbers (AKA Quotas) in a Sales team, so we may have a different requirement of actually calculating something like a Running Total not for the whole dataset, but rather for each person on the team. How could we approach this??
Let’s check out the code and results first, and it will all become clearer. But first, see if you can spot what changes in this code compared to the last example.
SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
SUM(SaleAmount) OVER (PARTITION BY Salesperson ORDER BY SaleDate) AS CumulativeSalePerPerson
FROM Sales;
If we study the new field “CumulativeSalePerPerson” we see that the pattern is a little harder to spot, but once we get to the third row it becomes a lot clearer. “Alice” had a first sale in Row 1 of “300” then she had another sale of “200” in the third row, so her cumulative sale at that point was then “500.” Similarly, Bob had sales represented in the 2nd and 5th rows, which is why he does not reach “450” until the 5th row, where he scores a sale of “300” to add to his previous “150.” It’s that simple! Imagine trying to rack your brain on how we could do this with a regular SQL query, it would be IMPOSSIBLE!?
3. Ranking Sales by SalesAmount
Now imagine if we had a sales competition going on to see which Salesperson can get the biggest wins (catch the largest fish).
?Obviously, we would want to have an easy way to rank. Normally, with a regular query, you may be tempted to simply ORDER BY SaleAmount DESC but then we would lose the existing order of the rest of the data. This is where the RANK() function comes in handy!
As we can see floating down in the 9th row, Alice caught the biggest catch of “450” placing her at the top spot! She also had 3rd, 5th, 10th, 12th, and 14th “rank” catches.?
4. Moving Average (3-Day) of SalesAmount
As a busy Sales team, it is important to look for the overall trends that a team may be progressing towards, in order to meet sales quotas. If you’re looking for trends rather than totals, the 3-day moving average smooths out the daily fluctuations and highlights the overall direction of sales. It’s like stepping back from a painting to see the big picture, rather than focusing on each individual brushstroke.?
For the simplicity of this example, we’ve used a 3-day WINDOW (3-day Moving Average), but it could just as easily have been a 7-day (Weekly MA), 30-day (Monthly), or any period of time you decide to look at! (Note: these window functions get quite long in a single line, so make sure to break them up with white space for proper code styling).
SELECT SaleID, SaleDate, Salesperson, SaleAmount,
AVG(SaleAmount) OVER (
ORDER BY SaleDate
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS MovingAverage
FROM Sales;
We used ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to look at the day before and the day after each row (AKA our “Window”). This is probably the most relevant reason why anyone even thought to call these functions by this name!
Now that we have done a couple of laps with window functions: the all-important question you may be asking yourself…
“Why do these things matter, anyway?”
Why Window Functions Matter
The main difference between Window Functions and GROUP BY Aggregate Functions is that while an Aggregate Function returns a single result per group of rows (like the SUM or AVG of a group), a Window Function will return a result for every row, often in relation to other rows in the window (like the running total at each row). Among my students, this is often been the KEY to understanding how Window functions work, and moreover WHY THEY ACTUALLY MATTER.
Troubleshooting Window Functions
If your window function isn’t working as expected, consider the following:
Mastering SQL window functions is akin to adding a powerful tool to your data manipulation toolkit. They provide advanced capabilities for complex data analysis and reporting, enabling you to draw insights and make informed decisions. Whether it’s calculating running totals, ranking results, or comparing individual rows to aggregated dataset metrics, window functions are INDISPENSABLE. Embrace them in your SQL journey, and you’ll find your queries reaching new heights of efficiency and clarity in no time!
In order to see the complete article and comprehensive guide, please check out my post on Medium here. This article is part of an SQL Mastery series. You can also check out the other comprehensive guide to Mastering SQL Subqueries.
We’ve navigated quite an intricate terrain of SQL window functions, uncovering their ability to transform complex data analysis into more manageable tasks. These advanced functions not only streamline our queries but also open up a world of possibilities for data exploration and reporting. As you continue to incorporate window functions into your SQL repertoire, remember that the key to mastery is practice and experimentation. So, dive in, explore, and let window functions be your guide in the realm of advanced SQL querying.?
Happy WINDOW-ing.?