ELI5: Windows Functions
Taming the SQL Beast requires advanced skills.

ELI5: Windows Functions

What are Window Functions?

Navigating the intricate landscape of database management and data analysis can sometimes feel like being an explorer in uncharted territories. There’s a certain thrill in unearthing hidden patterns and drawing meaningful insights from raw data.

At first glance, SQL Window Functions might seem like just another set of commands in the extensive SQL repertoire. However, they hold a secret power, a kind of hidden brush, that turns rows of data into a canvas of possibilities. Today, we set out to demystify these powerful functions, peel back the layers, and reveal the artistry and efficiency they bring to data analysis.?

As we venture through the intricacies of Window Functions, we’ll uncover their ability to not just answer our queries, but to tell a story with our data, to find the rhythm in the rows and the melody in the numbers. Whether you’re a data novice with a curious mind or a seasoned SQL maestro, this guide is crafted to guide your steps through the intricate dance of Window Functions and to open your eyes to the symphony of data that awaits.

When Should I Use Window Functions (ELI5)

Photo by

Imagine that you have some building blocks, and each building block represents some data. Your task requires you to look at certain groups of blocks or to make new blocks depending on the existing blocks that you have.

1. You Want to Compare Blocks Without Mixing Them?Up

Imagine you want to see if one block is taller than the blocks right next to it. A Window Function lets you look at each block and its neighbors without mixing them all up, so you can easily compare them.

2. You Want to Count or Add Up Blocks in a?Row

If you want to count how many blocks TOTAL you have in a column or add up their numbers, a Window Function can do that for you, looking at each block one by one and keeping a running total. It can help you find a running average of those blocks as well!

3. You Want to Find the Biggest or Smallest Block in a?Section

Let’s say you have your blocks sorted in rows by color, and you want to find the biggest block in each row. A Window Function helps you look at each row separately and pick out the biggest block in each one.

4. You Want to Give Blocks a Score or a?Rank

If you want to give each block a score or a rank based on its size or color, a Window Function can do that too. It looks at all the blocks, sorts them how you want, and then gives each one a number to show its rank in the overall set of blocks.

5. You Want to See How Blocks Compare to Their?Friends

Maybe you want to see if a block is taller than the average height of the blocks around it. A Window Function can look at a block and its buddies, calculate the average height, and then tell you how that block compares.

Window functions are SQL operations that perform a calculation across a set of rows that are related to the current row. Unlike (typical) aggregate functions, they do not cause rows to become grouped into a single output row?—?the rows retain their separate identities. Window functions can perform a calculation across a set of rows that are related to the current row. They are called window functions because they perform a calculation across a “window” of rows. For instance, you might want to calculate a running total of sales or find out the highest score in a group.

See the image below to see how they compare to aggregation functions.?

The difference between Aggregation Functions and Window Functions (explained simply).

Anatomy of a Window?Function

A lot of people have very complicated ways to go about explaining Window Functions, so my goal is to make it super simple for you!

Imagine you’re on a sightseeing bus, and you’re looking out the window. You see things one after the other, right? SQL Window Functions work a bit like that. They look at your data row by row, but (and here’s the cool part) they remember what they’ve seen before and what’s coming up next. It’s like having a photographic memory while sightseeing!

The hypothetical view from inside a Window Function (Created using DALLE3)

  • The Function: This could be a SUM, AVG, MAX, or any other function you need. It’s generally the heart of the (mathematical) operation that you want to perform! These are similar to regular aggregate functions but do not reduce the number of rows returned.
  • OVER(): This part lets SQL know that we’re about to do something special, setting the stage for our Window Function. OVER()is the cornerstone of window functions in SQL. This clause empowers us to designate a “window” or a subset of data that the function will process.
  • PARTITION BY: (Optional) If you want to perform your calculations on specific chunks (groups) of your data, this is how you tell SQL to divide things up. If no PARTITION BY is specified, the function treats all rows of the query result set as a single partition. It works similarly to the GROUP BY clause, but while GROUP BY aggregates the data, PARTITION BY doesn’t, it just groups the data for the purpose of the window function.?
  • ORDER BY: (Optional) This orders the rows within each partition. If no ORDER BY is specified, the function treats all rows of the partition as a single group.

The little helpers inside of a Window Function. Created by DALLE3

Let’s look at this in actual SQL Code:

SELECT column_name, 
       WINDOW_FUNCTION(column_name) OVER (
           PARTITION BY column_name 
           ORDER BY column_name 
           RANGE/ROWS BETWEEN ... AND ...
       ) 
FROM table_name;        

There you have it! You have a high-level bird’s eye view of Window Functions work. We’ll of course want to look at some basic examples to tie this all together, so we’ll do that next.?

Examples To Make it?Stick

Let’s imagine that we have some simple Sales Data and line items for this sales data.?

Some Sample Sales Data

1. Running?Totals

SELECT 
  SaleID, 
  Salesperson, 
  SaleAmount, 
  SaleDate, 
  SUM(SaleAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM Sales;        

This will calculate the running total of SaleAmount for each row ordered by SaleDate. The results are shown below. Notice the new column called RunningTotal! There you have it, we just created a new column! You may have seen this elsewhere as a “calculated field.”


The Result of our RunningTotal query adds a new column to our original data "RunningTotal"

2. Cumulative Totals (By SalesPerson)

Now what if we wanted to see how each member of the Sales team was evolving over time? It is quite important to keep track of numbers (AKA Quotas) in a Sales team, so we may have a different requirement of actually calculating something like a Running Total not for the whole dataset, but rather for each person on the team. How could we approach this??

Let’s check out the code and results first, and it will all become clearer. But first, see if you can spot what changes in this code compared to the last example.

SELECT 
  SaleID, 
  Salesperson, 
  SaleAmount, 
  SaleDate, 
  SUM(SaleAmount) OVER (PARTITION BY Salesperson ORDER BY SaleDate) AS CumulativeSalePerPerson
FROM Sales;        

If we study the new field “CumulativeSalePerPerson” we see that the pattern is a little harder to spot, but once we get to the third row it becomes a lot clearer. “Alice” had a first sale in Row 1 of “300” then she had another sale of “200” in the third row, so her cumulative sale at that point was then “500.” Similarly, Bob had sales represented in the 2nd and 5th rows, which is why he does not reach “450” until the 5th row, where he scores a sale of “300” to add to his previous “150.” It’s that simple! Imagine trying to rack your brain on how we could do this with a regular SQL query, it would be IMPOSSIBLE!?

Cumulative Sales Performance Over Time of each SalesPerson in our sample data set. (Note: This graph was not generated with SQL)
Cumulative Sales Performance Over Time of each SalesPerson in our sample data set. (Note: This graph was not generated with SQL)


3. Ranking Sales by SalesAmount

Now imagine if we had a sales competition going on to see which Salesperson can get the biggest wins (catch the largest fish).

Funny illustration in an office setting where salespersons, with briefcases and formal attire, are in a heated discussion over the size of the fish they're holding. Some use magnifying glasses, while others use rulers, all in the quest to claim bragging rights for the largest catch.
Salesman measuring who has the bigger "catch"

?Obviously, we would want to have an easy way to rank. Normally, with a regular query, you may be tempted to simply ORDER BY SaleAmount DESC but then we would lose the existing order of the rest of the data. This is where the RANK() function comes in handy!

As we can see floating down in the 9th row, Alice caught the biggest catch of “450” placing her at the top spot! She also had 3rd, 5th, 10th, 12th, and 14th “rank” catches.?

4. Moving Average (3-Day) of SalesAmount

As a busy Sales team, it is important to look for the overall trends that a team may be progressing towards, in order to meet sales quotas. If you’re looking for trends rather than totals, the 3-day moving average smooths out the daily fluctuations and highlights the overall direction of sales. It’s like stepping back from a painting to see the big picture, rather than focusing on each individual brushstroke.?

Comic scene in a corporate boardroom where salespersons of different descents proudly display their fish on the table. Some have large fish, while others have small ones, but all are trying to exaggerate the size of their catch. A sales chart in the background humorously plots fish sizes.
Where is our Sales going?

For the simplicity of this example, we’ve used a 3-day WINDOW (3-day Moving Average), but it could just as easily have been a 7-day (Weekly MA), 30-day (Monthly), or any period of time you decide to look at! (Note: these window functions get quite long in a single line, so make sure to break them up with white space for proper code styling).

SELECT SaleID, SaleDate, Salesperson, SaleAmount, 
       AVG(SaleAmount) OVER (
                    ORDER BY SaleDate 
                    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
                            ) AS MovingAverage
FROM Sales;        
A resultant SQL Table showing the results of calculating a 3-day moving average with window functions

We used ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to look at the day before and the day after each row (AKA our “Window”). This is probably the most relevant reason why anyone even thought to call these functions by this name!

Now that we have done a couple of laps with window functions: the all-important question you may be asking yourself…

“Why do these things matter, anyway?”

Why Window Functions Matter

The main difference between Window Functions and GROUP BY Aggregate Functions is that while an Aggregate Function returns a single result per group of rows (like the SUM or AVG of a group), a Window Function will return a result for every row, often in relation to other rows in the window (like the running total at each row). Among my students, this is often been the KEY to understanding how Window functions work, and moreover WHY THEY ACTUALLY MATTER.

Troubleshooting Window Functions

If your window function isn’t working as expected, consider the following:

  • Check your OVER clause: The OVER clause determines how the window function behaves. Make sure that you've specified the PARTITION BY and ORDER BY clauses correctly.
  • Review your function’s syntax: Each window function has its own syntax. Be sure to review the syntax of the function you’re using to ensure it’s correct.
  • Examine the data types: Make sure the data types you’re using in the function are compatible. For example, you can’t perform a SUM operation on a text field (or a column with a hidden string value)

Mastering SQL window functions is akin to adding a powerful tool to your data manipulation toolkit. They provide advanced capabilities for complex data analysis and reporting, enabling you to draw insights and make informed decisions. Whether it’s calculating running totals, ranking results, or comparing individual rows to aggregated dataset metrics, window functions are INDISPENSABLE. Embrace them in your SQL journey, and you’ll find your queries reaching new heights of efficiency and clarity in no time!

In order to see the complete article and comprehensive guide, please check out my post on Medium here. This article is part of an SQL Mastery series. You can also check out the other comprehensive guide to Mastering SQL Subqueries.

Illustration of a towering mountain labeled 'SQL Summit'. Climbers of diverse gender and descent are scaling its steep sides, using tools like query ropes and data-driven pickaxes. At the summit, a flag with the SQL logo flutters, symbolizing the achievement of mastering SQL.
Climbing (and conquering) the SQL Summit. In real life, this is a moving target!


We’ve navigated quite an intricate terrain of SQL window functions, uncovering their ability to transform complex data analysis into more manageable tasks. These advanced functions not only streamline our queries but also open up a world of possibilities for data exploration and reporting. As you continue to incorporate window functions into your SQL repertoire, remember that the key to mastery is practice and experimentation. So, dive in, explore, and let window functions be your guide in the realm of advanced SQL querying.?

Happy WINDOW-ing.?


要查看或添加评论,请登录

社区洞察