Day - 11 of SQL for Data Science

Day - 11 of SQL for Data Science

SQL Window Function

SQL window functions are a specialized feature in SQL that allows you to perform calculations and aggregations over a "window" of rows related to the current row. Unlike regular aggregate functions, which collapse rows into a single result, window functions maintain the individual rows and calculate values based on a specified set of rows within the result set.

The basic syntax for a window function is as follows :

SELECT column1, column2, ..., window_function() OVER (PARTITION BY partition_column ORDER BY order_column) 
FROM table_name;         

Let's break down the components of the syntax :

  • window_function() : This is the window function itself, such as SUM, AVG, ROW_NUMBER, RANK, DENSE_RANK, etc. It is the function that will be applied to the data within the window.
  • OVER : This clause specifies the window definition. It consists of two parts:

  1. PARTITION BY: Optional clause that divides the result set into partitions or groups based on the specified column(s). The window function will operate independently within each partition.
  2. ORDER BY: Specifies the column(s) used to order the rows within each partition. The window function will take the ordering into account when calculating the results.

Use the ERD Diagram to write and understand queries :

No alt text provided for this image

Note : A running total, also known as a cumulative sum, is a common use case for SQL window functions. It calculates the sum of a column up to the current row, considering the order specified in the query. Here's how you can use window functions to calculate a running total in SQL.

Example

Create a running total of?standard_amt_usd?(in the?orders?table) over order time with no date truncation. Your final table should have two columns : one with the amount being added for each new row, and a second with the running total.

SELECT standard_amt_usd,
       SUM(standard_amt_usd) OVER (ORDER BY occurred_at) AS running_total
FROM orders;        

Note : ROW_NUMBER() is a window function that assigns a unique integer value to each row within the partition. It starts from 1 for the first row in the partition and increments by 1 for each subsequent row in the order specified.

Example for ROW_NUMBER()

Suppose we have a table called "employees" with columns "employee_name," "department," and "salary," representing employee information and their respective departments and salaries.

SELECT employee_name, department, salary, 
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num 
FROM employees;        

Output

employee_name | department | salary | row_num
----------------------------------------------
Alice? ? ? ? ?| HR? ? ? ? ?| 60000? | 1
Bob? ? ? ? ? ?| HR? ? ? ? ?| 55000? | 2
Claire? ? ? ? | HR? ? ? ? ?| 50000? | 3
David? ? ? ? ?| IT? ? ? ? ?| 70000? | 1
Emma? ? ? ? ? | IT? ? ? ? ?| 65000? | 2
Frank? ? ? ? ?| Finance? ? | 80000? | 1        

Explanation

  • The OVER clause with PARTITION BY department ORDER BY salary DESC divides the employees into partitions based on their departments and orders them by salary in descending order within each partition.
  • For the "HR" department, we have three employees with different salaries. Alice has the highest salary (60000) and gets assigned row_num 1. Bob has the second-highest salary (55000) and gets assigned row_num 2. Claire has the lowest salary (50000) and gets assigned row_num 3.
  • For the "IT" department, there are two employees. David has the highest salary (70000) and gets assigned row_num 1. Emma has the second-highest salary (65000) and gets assigned row_num 2.
  • For the "Finance" department, we have only one employee, Frank, with the highest salary (80000), so he is assigned row_num 1.

Note : RANK() is a window function that assigns a unique rank to rows within the partition based on the order specified. It assigns the same rank to rows with the same values in the ORDER BY clause, leaving gaps in ranking for ties.

For example, if two rows have the same value and are ranked as 1, the next row is assigned rank 3 (skipping rank 2).

Example for RANK()

Let's use the same "employees" table and apply the RANK() window function.

SELECT employee_name, department, salary, 
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank 
FROM employees;         

Output

employee_name | department | salary | rank
-------------------------------------------
Alice? ? ? ? ?| HR? ? ? ? ?| 60000? | 1
Bob? ? ? ? ? ?| HR? ? ? ? ?| 55000? | 2
Claire? ? ? ? | HR? ? ? ? ?| 55000? | 2
David? ? ? ? ?| IT? ? ? ? ?| 70000? | 1
Emma? ? ? ? ? | IT? ? ? ? ?| 65000? | 2
Frank? ? ? ? ?| Finance? ? | 80000? | 1        

Explanation

Similar to the ROW_NUMBER() example, the OVER clause with PARTITION BY department ORDER BY salary DESC divides the employees into partitions based on their departments and orders them by salary in descending order within each partition.

  • For the "HR" department, we have three employees with different salaries. Alice has the highest salary (60000) and gets assigned rank 1. Both Bob and Claire have the same salary (55000), so they get assigned the same rank 2, and there is no rank 3.
  • For the "IT" department, David has the highest salary (70000) and gets assigned rank 1, while Emma has the second-highest salary (65000) and gets assigned rank 2.
  • For the "Finance" department, Frank has the highest salary (80000), so he is assigned rank 1.



要查看或添加评论,请登录

Mrityunjay Pathak的更多文章

  • Bias and Variance and Its Trade Off

    Bias and Variance and Its Trade Off

    There are various ways to evaluate a machine-learning model. Bias and Variance are one such way to help us in parameter…

  • Machine Learning Mathematics??

    Machine Learning Mathematics??

    Machine Learning is the field of study that gives computers the capability to learn without being explicitly…

  • How to Modify your GitHub Profile Readme File as your Portfolio

    How to Modify your GitHub Profile Readme File as your Portfolio

    What if you don't have a personal portfolio website? No worries! You can transform your GitHub README.md into a…

    4 条评论
  • Data Science Resources

    Data Science Resources

    Are you starting your journey into the world of Data Science? Here's a curated list of top resources to master various…

  • 25 Python Sets Questions with Solution

    25 Python Sets Questions with Solution

    25 Python Sets Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Write a Python…

  • 25 Python Tuple Questions with Solution

    25 Python Tuple Questions with Solution

    25 Python Tuple Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Find the length of a…

  • 25 Python Dictionary Questions and Solutions

    25 Python Dictionary Questions and Solutions

    25 Python Dictionary Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Create an empty…

  • 25 Python List Questions with Solution

    25 Python List Questions with Solution

    25 Python List Coding Questions along with Explanations for each. Let's get started ↓ Question: Given a list nums, find…

    2 条评论
  • 25 Python String Questions with Solution

    25 Python String Questions with Solution

    25 Python Strings Coding Questions along with Explanations for each. Let's get started ↓ Write a Python program to…

    3 条评论
  • 25 Python Loop Coding Questions

    25 Python Loop Coding Questions

    25 Python Loop Coding Questions along with Explanations for each. Let's get started ↓ Print numbers from 1 to 10 using…

    3 条评论

社区洞察

其他会员也浏览了