Debunking Window Functions

Debunking Window Functions

A retail company have a employee_sales table which logs the sales of every employee in every city he is working for every department and for each month.


The sales head wants to get a couple of reports:

  1. Total sales for each department in the month of January
  2. Performance of the best employee of each department for the month of January.

For the first report, one just has to group each department and then calculate the sum of sales_amount where the month is January.

SELECT dept, SUM(sales_amount) AS total_sales
FROM employee_sales
WHERE month = 'January'
GROUP BY dept;        

But the second report is a bit complex. The boss wants to know per department who has made the highest sales.

In that case:

  1. We have to group each department into an imaginary WINDOW.
  2. Then sort each window w.r.t. the sales_amount in descending order.
  3. From this result, we have to get the row that is in the first position. That's it!


In the above table, we can see what happens to the original table when we window(group) it by dept and the sort the resulting window in descending order by sales_amount. In this case, we use ROW_NUMBER() window function to number each resulting row starting from 1 and the row with highest rank is the best performer. Yay!

WITH january_sales AS (
SELECT emp_id, dept, sales_amount,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY SUM(sales) DESC) as rank
FROM employee_sales
WHERE month = 'January')
SELECT emp_id, dept, sales_amount
FROM january_sales
WHERE rank = 1;        

While for generating the first report, aggregate function is applied on the full table, whereas, for the second report, firs the table is grouped into windows and then the highest from each of the groups is selected.

There are several types of window functions in SQL, typically falling into the following categories:

  1. Ranking Functions: These functions return a ranking value for each row in a window. They can be used to rank rows, find the top N rows, etc. Examples include RANK(), DENSE_RANK(), and ROW_NUMBER().
  2. Aggregate Functions: These functions perform calculations across a set of rows and return a single output row. Examples include SUM(), AVG(), MIN(), MAX(), and COUNT().
  3. Analytic Functions: These are similar to aggregate functions, but they return a group of rows that can be further analyzed. Examples include FIRST_VALUE(), LAST_VALUE(), LAG(), and LEAD().
  4. Cumulative Distribution and Percent Rank Functions: These functions provide a sort of ranking, where the rank is represented as a percentage. Examples include PERCENT_RANK(), CUME_DIST(), NTILE().


要查看或添加评论,请登录

Parijat Bose的更多文章

社区洞察

其他会员也浏览了