Day - 11 of SQL for Data Science
Mrityunjay Pathak
Using Statistics and Machine Learning to solve Business Problems!
SQL Window Function
SQL window functions are a specialized feature in SQL that allows you to perform calculations and aggregations over a "window" of rows related to the current row. Unlike regular aggregate functions, which collapse rows into a single result, window functions maintain the individual rows and calculate values based on a specified set of rows within the result set.
The basic syntax for a window function is as follows :
SELECT column1, column2, ..., window_function() OVER (PARTITION BY partition_column ORDER BY order_column)
FROM table_name;
Let's break down the components of the syntax :
Use the ERD Diagram to write and understand queries :
Note : A running total, also known as a cumulative sum, is a common use case for SQL window functions. It calculates the sum of a column up to the current row, considering the order specified in the query. Here's how you can use window functions to calculate a running total in SQL.
Example
Create a running total of?standard_amt_usd?(in the?orders?table) over order time with no date truncation. Your final table should have two columns : one with the amount being added for each new row, and a second with the running total.
SELECT standard_amt_usd,
SUM(standard_amt_usd) OVER (ORDER BY occurred_at) AS running_total
FROM orders;
Note : ROW_NUMBER() is a window function that assigns a unique integer value to each row within the partition. It starts from 1 for the first row in the partition and increments by 1 for each subsequent row in the order specified.
Example for ROW_NUMBER()
Suppose we have a table called "employees" with columns "employee_name," "department," and "salary," representing employee information and their respective departments and salaries.
领英推荐
SELECT employee_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;
Output
employee_name | department | salary | row_num
----------------------------------------------
Alice? ? ? ? ?| HR? ? ? ? ?| 60000? | 1
Bob? ? ? ? ? ?| HR? ? ? ? ?| 55000? | 2
Claire? ? ? ? | HR? ? ? ? ?| 50000? | 3
David? ? ? ? ?| IT? ? ? ? ?| 70000? | 1
Emma? ? ? ? ? | IT? ? ? ? ?| 65000? | 2
Frank? ? ? ? ?| Finance? ? | 80000? | 1
Explanation
Note : RANK() is a window function that assigns a unique rank to rows within the partition based on the order specified. It assigns the same rank to rows with the same values in the ORDER BY clause, leaving gaps in ranking for ties.
For example, if two rows have the same value and are ranked as 1, the next row is assigned rank 3 (skipping rank 2).
Example for RANK()
Let's use the same "employees" table and apply the RANK() window function.
SELECT employee_name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
Output
employee_name | department | salary | rank
-------------------------------------------
Alice? ? ? ? ?| HR? ? ? ? ?| 60000? | 1
Bob? ? ? ? ? ?| HR? ? ? ? ?| 55000? | 2
Claire? ? ? ? | HR? ? ? ? ?| 55000? | 2
David? ? ? ? ?| IT? ? ? ? ?| 70000? | 1
Emma? ? ? ? ? | IT? ? ? ? ?| 65000? | 2
Frank? ? ? ? ?| Finance? ? | 80000? | 1
Explanation
Similar to the ROW_NUMBER() example, the OVER clause with PARTITION BY department ORDER BY salary DESC divides the employees into partitions based on their departments and orders them by salary in descending order within each partition.