SQL Window Function

SQL Window Function

Window functions in SQL are a type of analytical function that perform calculations across a set of rows that are related to the current row, called a "window".

A window function calculates a value for each row in the result set based on a subset of the rows that are defined by a window specification.

The window specification is defined using the OVER() clause in SQL, which specifies the partitioning and ordering of the rows that the window function will operate on. The partitioning divides the rows into groups based on a specific column or expression, while the ordering defines the order in which the rows are processed within each group.

Window functions applies aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. OVER clause does two things :?

  • Partitions rows into form set of rows. (PARTITION BY clause is used)?
  • Orders rows within those partitions into a particular order. (ORDER BY clause is used)?

Note:?If partitions aren’t done, then ORDER BY orders all rows of table.?

Syntax:??

SELECT coulmn_name1, 
 window_function(cloumn_name2)
 OVER([PARTITION BY column_name1] [ORDER BY column_name3]) AS new_column
FROM table_name;
 
 
window_function= any aggregate or ranking function    
column_name1= column to be selected
coulmn_name2= column on which window function is to be applied
column_name3= column on whose basis partition of rows is to be done
new_column= Name of new column
table_name= Name of table        

Window functions can only be used in the?SELECT?or?ORDER BY?clauses of a SQL query. In a window function, all the rows returned by the?SELECT?are divided into windows (or frames). The criteria for deciding which rows belong to which window is defined by putting a?PARTITION BY?subclause inside the window function invocation. The window function then?computes?one result based on all the rows?in one window.

List of Window Functions

  • Ranking Functions
  • row_number()
  • rank()
  • dense_rank()
  • Distribution Functions
  • percent_rank()
  • cume_dist()
  • Analytic Functions
  • lead()
  • lag()
  • ntile()
  • first_value()
  • last_value()
  • nth_value()
  • Aggregate Functions
  • avg()
  • count()
  • max()
  • min()
  • sum()


No alt text provided for this image
No alt text provided for this image


Let's create a table marks:

CREATE DATABASE testdb;
use testdb;


CREATE TABLE marks (
?student_id INTEGER PRIMARY KEY AUTO_INCREMENT,
? ? name VARCHAR(255),
? ? branch VARCHAR(255),
? ? obtained_mark  INTEGER
);


INSERT INTO marks (name,branch,obtained_mark)VALUES?
('Nitish','EEE',82),
('Rishabh','EEE',91),
('Anukant','EEE',69),
('Rupesh','EEE',55),
('Shubham','CSE',78),
('Ved','CSE',43),
('Deepak','CSE',98),
('Arpan','CSE',95),
('Vinay','ECE',95),
('Ankit','ECE',88),
('Anand','ECE',81),
('Rohit','ECE',95),
('Prashant','MECH',75),
('Amit','MECH',69),
('Sunny','MECH',39),
('Gautam','MECH',51);

SELECT * FROM marks;        
No alt text provided for this image

Now find the average marks by branch, we can get it by applying group by

SELECT
? ? branch, AVG(obtained_mark) AS branch_avg
FROM
? ? marks
GROUP BY branch;?        

Output:

The same result we can be obtained by using the window function :

select *,
? ? ? ?avg(obtained_mark) over(partition by branch) 'branch_avg'?
from marks
order by student_id;?        

Output:

No alt text provided for this image
No alt text provided for this image

GROUP BY vs Window Functions

When comparing window functions and?GROUP BY, it’s essential to remember that?GROUP BY?collapses the individual records into groups; after using GROUP BY, you cannot refer to any individual field because it is collapsed.?

Window functions are a powerful feature of SQL. They allow us to apply functions like?AVG,?COUNT,?MAX, and?MIN?on a group of records while still leaving the individual records accessible. Since the individual records are not collapsed, we can create queries showing data from the individual record together with the result of the window function. This is what makes window functions so powerful.

Suppose we want to obtain a list of student_id, student names, branch, and calculate the average of the "obtained_mark" column across all rows in the "marks" table.

select *,
? ? ? ?avg(obtained_mark) over() 'overall_avg'?
from marks;        
No alt text provided for this image

Explanation:

This SQL query uses a window function to calculate the average of the "obtained_mark" column across all rows in the "marks" table. Let's break down the query step by step:

  1. SELECT *: This part of the query selects all columns from the "marks" table.
  2. avg(obtained_mark) over() 'branch_avg': This is where the window function is used. Here's what each part means:

  • avg(obtained_mark): This calculates the average of the "obtained_mark" column for each row.
  • over(): This is the window specification. The empty parentheses () indicate that the window frame includes all rows in the partition (the whole result set). This is why the average is calculated across all rows.
  • 'overall_avg': This assigns an alias name "overall_avg" to the calculated average value.

So, the combined query retrieves all columns from the "marks" table along with an additional column that contains the average of the "obtained_mark" column across all rows. This average value will be repeated for every row because there's no specific partitioning or ordering specified for the window function.

Let's say want to retrieve all columns from the "marks" table and include an additional column showing the average obtained marks for each branch.

select *,
	avg(obtained_mark) over(partition by branch) 'branch_avg'?
from marks;?        

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • AVG(obtained_mark) OVER(PARTITION BY branch) AS 'branch_avg': This is where a window function is used with partitioning. Here's what each part means:
  • AVG(obtained_mark): This calculates the average of the "obtained_mark" column for each row.
  • OVER(PARTITION BY branch): This defines a window frame that partitions the result set into separate groups based on the "branch" column. The window function, in this case, the average calculation, will be performed separately for each branch.
  • AS 'branch_avg': This assigns an alias name "branch_avg" to the calculated average value for each branch.

Output:

No alt text provided for this image

Let's say want to retrieve all columns from the "marks" table and include two additional columns: one showing the minimum obtained marks across all students and another showing the maximum obtained marks across all students.

select *,
min(obtained_mark) over() min_marks,
max(obtained_mark) over() max_marks
from marks;        

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • MIN(obtained_mark) OVER() AS min_marks: This uses the MIN() window function to calculate the minimum value of the "obtained_mark" column across all rows.
  • MAX(obtained_mark) OVER() AS max_marks: This uses the MAX() window function to calculate the maximum value of the "obtained_mark" column across all rows.
  • OVER(): This indicates that the window frame includes all rows in the result set. Since no partitioning or ordering is specified, the minimum and maximum are calculated for the entire result set.
  • AS min_marks and AS max_marks: These assign alias names "min_marks" and "max_marks" to the calculated minimum and maximum values, respectively.

Output:

No alt text provided for this image

Let's say we want to retrieve all columns from the "marks" table and include two additional columns for each row: one showing the minimum obtained marks within the same branch and another showing the maximum obtained marks within the same branch.

SELECT *,
? ? ? ?MIN(obtained_mark) OVER(PARTITION BY branch) AS branch_min_marks,
? ? ? ?MAX(obtained_mark) OVER(PARTITION BY branch) AS branch_max_marks
FROM marks;        

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • MIN(obtained_mark) OVER(PARTITION BY branch) AS branch_min_marks: This uses the MIN() window function to calculate the minimum value of the "obtained_mark" column within the same branch.
  • MAX(obtained_mark) OVER(PARTITION BY branch) AS branch_max_marks: This uses the MAX() window function to calculate the maximum value of the "obtained_mark" column within the same branch.
  • OVER(PARTITION BY branch): This indicates that the window frame is partitioned by the "branch" column. The minimum and maximum are calculated separately for each branch.
  • AS branch_min_marks and AS branch_max_marks: These assign alias names "branch_min_marks" and "branch_max_marks" to the calculated minimum and maximum values within the same branch, respectively.

Output:

No alt text provided for this image

You want to retrieve all columns from the "marks" table and include several additional columns for each row: overall average obtained marks, average obtained marks within the same branch, minimum obtained marks across all students, maximum obtained marks across all students, minimum obtained marks within the same branch, and maximum obtained marks within the same branch. The result set should be ordered by student ID.

SQL Query:



SELECT  *, 
        AVG(obtained_mark) OVER() AS overall_avg, 
        AVG(obtained_mark) OVER(PARTITION BY branch) AS branch_avg, 
        MIN(obtained_mark) OVER() AS overall_min_marks, 
        MAX(obtained_mark) OVER() AS overall_max_marks, 
        MIN(obtained_mark) OVER(PARTITION BY branch) AS branch_min_marks, 
        MAX(obtained_mark) OVER(PARTITION BY branch) AS branch_max_marks 
FROM marks 
ORDER BY student_id;         

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • AVG(obtained_mark) OVER() AS overall_avg: This calculates the average of the "obtained_mark" column across all rows and assigns it to the "overall_avg" column for each row.
  • AVG(obtained_mark) OVER(PARTITION BY branch) AS branch_avg: This calculates the average of the "obtained_mark" column within the same branch and assigns it to the "branch_avg" column for each row.
  • MIN(obtained_mark) OVER() AS overall_min_marks: This calculates the minimum value of the "obtained_mark" column across all rows and assigns it to the "min_marks" column for each row.
  • MAX(obtained_mark) OVER() AS overall_max_marks: This calculates the maximum value of the "obtained_mark" column across all rows and assigns it to the "max_marks" column for each row.
  • MIN(obtained_mark) OVER(PARTITION BY branch) AS branch_min_marks: This calculates the minimum value of the "obtained_mark" column within the same branch and assigns it to the "branch_min_marks" column for each row.
  • MAX(obtained_mark) OVER(PARTITION BY branch) AS branch_max_marks: This calculates the maximum value of the "obtained_mark" column within the same branch and assigns it to the "branch_max_marks" column for each row.
  • ORDER BY student_id: This orders the result set by the "student_id" column.

The resulting dataset will include the original columns from the "marks" table along with the calculated columns for overall and branch-specific averages, minimum and maximum marks for both the entire dataset and within each branch. The result set will be ordered by student ID.

Output:

No alt text provided for this image

Aggregate Window Function :?

Various aggregate functions such as SUM(), COUNT(), AVERAGE(), MAX(), MIN() applied over a particular window (set of rows) are called aggregate window functions.?

No alt text provided for this image
No alt text provided for this image

Aggregate Function with OVER()

Find all the students who have marks higher than the avg marks of

their respective branch

SELECT *
FROM (
? SELECT *,
? ? ? ? ?AVG(obtained_mark) OVER(PARTITION BY branch) AS branch_avg
? FROM marks
) AS t
WHERE obtained_mark > t.branch_avg
ORDER BY student_id;        

Explanation:

  1. The inner subquery (enclosed in parentheses) calculates the average obtained marks for each branch using the window function AVG(obtained_mark) OVER(PARTITION BY branch) AS branch_avg. This adds a new column branch_avg to the result set, which contains the average obtained marks for each student's branch.
  2. The outer query then retrieves all columns from the subquery result (SELECT * FROM ...). This includes both the original columns from the "marks" table and the calculated branch_avg column.
  3. The WHERE clause filters the results, selecting only the rows where the obtained_mark is greater than the corresponding branch_avg. This effectively filters out students who have marks lower than or equal to the average marks of their branch.

Overall, the query effectively finds and lists all students whose marks are higher than the average marks of their respective branches. It combines the power of window functions and subqueries to achieve this filtering based on branch averages.

Output:

No alt text provided for this image

Find all the students who have marks lower than the avg marks of

their respective branch

-- Find all the students who have marks lower than the avg marks of? their respective branch


select * from
(select *,
avg(obtained_mark) over(partition by branch) as branch_avg
from marks) as t
where obtained_mark < t.branch_avg
order by student_id;
?        



No alt text provided for this image

Example –?

Find average salary of employees for each department and order employees within a department by age.?

SELECT Name, Age, Department, Salary,
 AVERAGE(Salary) OVER( PARTITION BY Department) AS Avg_Salary
 FROM employee;

OR

SELECT *,
 AVERAGE(Salary) OVER( PARTITION BY Department) AS Avg_Salary
 FROM employee;        
No alt text provided for this image

Let’s consider another case:?

SELECT Name, Age, Department, Salary,
 AVERAGE(Salary) OVER( PARTITION BY Department ORDER BY Age) AS Avg_Salary
 FROM employee         

Here we also order the records within the partition as per age values and hence the average values change as per the sorted order.

The output of above query will be :??

No alt text provided for this image

Ranking Window Functions :?

Ranking functions are, RANK(), DENSE_RANK(), ROW_NUMBER()?

  • RANK() –?
  • As the name suggests, the rank function assigns rank to all the rows within every partition. Rank is assigned such that rank 1 given to the first row and rows having same value are assigned same rank. For the next rank after two same rank values, one rank value will be skipped.?
  • DENSE_RANK() –?
  • It assigns rank to each row within partition. Just like rank function first row is assigned rank 1 and rows having same value have same rank. The difference between RANK() and DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same rank, consecutive integer is used, no rank is skipped.?
  • ROW_NUMBER() –?
  • It assigns consecutive integers to all the rows within partition. Within a partition, no two rows can have same row number.?

Note –?

ORDER BY() should be specified compulsorily while using rank window functions.?

Example –?

Calculate row no., rank, dense rank of employees is employee table according to salary within each department.?

SELECT 
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) 
AS emp_row_no, Name,  Department, Salary,
RANK() OVER(PARTITION BY Department 
ORDER BY Salary DESC) AS emp_rank,
DENSE_RANK() OVER(PARTITION BY Department 
                  ORDER BY Salary DESC) 
                  AS emp_dense_rank,
FROM employee         

The output of above query will be :??

No alt text provided for this image

Write a SQL query to calculate the rank of each student based on their obtained marks in descending order.


SELECT *,
? ? ? ?RANK() OVER(ORDER BY obtained_mark DESC) AS rank
FROM marks;        

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • RANK() OVER(ORDER BY obtained_mark DESC) AS rank: This uses the RANK() window function to assign a rank to each row based on the order of obtained marks in descending order.
  • RANK() assigns the same rank to rows with the same obtained marks, leaving gaps in ranking if there are ties.
  • ORDER BY obtained_mark DESC specifies that the ranking should be based on the obtained marks in descending order. Students with higher marks will have a lower rank number.
  • AS rank: This assigns an alias name "rank" to the calculated rank value for each row.

Output:

No alt text provided for this image

Write a SQL query calculate the rank of each student based on their obtained marks within their respective branches, with the ranking determined in descending order of obtained marks.

SELECT *,
? ? ? ?RANK() OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS 'rank'
FROM marks;        

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • RANK() OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS rank: This uses the RANK() window function to assign a rank to each student's row, but the ranking is partitioned by the "branch" column.
  • PARTITION BY branch means that the ranking will be done separately for each branch. In other words, students within the same branch will have separate rankings.
  • ORDER BY obtained_mark DESC specifies that the ranking should be based on the obtained marks in descending order. Students with higher marks will have a lower rank number within their respective branches.
  • AS rank: This assigns an alias name "rank" to the calculated rank value for each row.

No alt text provided for this image

  • RANK() vs DENSE_RANK()

SELECT *,
? ? ? ?RANK() OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS 'rank',
? ? ? ?DENSE_RANK() OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS 'dense_rank'
FROM marks;        

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • RANK() OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS rank: This uses the RANK() window function to assign a rank to each student's row within their respective branches.
  • PARTITION BY branch means that the ranking will be done separately for each branch.
  • ORDER BY obtained_mark DESC specifies that the ranking should be based on the obtained marks in descending order. Students with higher marks will have a lower rank number within their respective branches.
  • DENSE_RANK() OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS dense_rank: This uses the DENSE_RANK() window function to assign a dense rank to each student's row within their respective branches.
  • PARTITION BY branch means that the ranking will be done separately for each branch.
  • ORDER BY obtained_mark DESC specifies that the ranking should be based on the obtained marks in descending order. Students with the same obtained marks will have the same rank, but there will be no gaps in ranking.
  • AS rank and AS dense_rank: These assign alias names "rank" and "dense_rank" to the calculated rank values for each row.

Output:

No alt text provided for this image

In this example, both rank and dense_rank are calculated based on the students' obtained marks within their respective branches. The ranking in the rank column leaves gaps in ranking for tied marks, whereas the dense_rank column assigns consecutive ranks even for tied marks.

Both RANK() and DENSE_RANK() are window functions in SQL used to assign rankings to rows within result sets. However, they differ in how they handle tied or duplicate values. Here's a brief explanation of the key differences:

  1. RANK() Function:

  • The RANK() function assigns a unique rank to each distinct value in the result set. If multiple rows share the same value, they receive the same rank, and the subsequent rank is skipped.
  • After assigning ranks to tied values, the next rank is determined by adding the count of tied values to the current rank. This creates gaps in ranking for tied values.
  • For example, if two rows have the same highest value and are ranked as 1st and 2nd, the next row will be ranked as 4th (skipping 3rd) if there are three rows with the same highest value.
  • Use RANK() when you want to assign consecutive ranks to distinct values but allow gaps in ranking for tied values.

  1. DENSE_RANK() Function:

  • The DENSE_RANK() function assigns a unique rank to each distinct value in the result set, just like RANK().
  • However, unlike RANK(), DENSE_RANK() does not skip any rank for tied values. It assigns consecutive ranks to tied values without gaps.
  • For example, if two rows have the same highest value and are ranked as 1st and 2nd, the next row will be ranked as 3rd (without skipping any rank) if there are three rows with the same highest value.
  • Use DENSE_RANK() when you want to assign consecutive ranks to distinct values and maintain consecutive ranks for tied values as well.

In summary:

  • RANK() assigns consecutive ranks but may skip ranks for tied values.
  • DENSE_RANK() assigns consecutive ranks without skipping any ranks for tied values.

The choice between these functions depends on your specific ranking requirements and how you want to handle tied values in your result set.

Example:

SELECT *,
? ? ? ?RANK() OVER(ORDER BY obtained_mark DESC) AS rank,
? ? ? ?DENSE_RANK() OVER(ORDER BY obtained_mark DESC) AS dense_rank
FROM marks;        


No alt text provided for this image

Window Frame

A?window frame?is a set of rows that are somehow related to the current row. The window frame is evaluated separately within each partition.

A frame in a window function is a subset of rows within the partition that

determines the scope of the window function calculation. The frame is defined

using a combination of two clauses in the window function: ROWS and BETWEEN.

The ROWS clause specifies how many rows should be included in the frame

relative to the current row. For example, ROWS 3 PRECEDING means that the

frame includes the current row and the three rows that precede it in the partition.

The BETWEEN clause specifies the boundaries of the frame.

No alt text provided for this image
Window Frame

The bounds can be any of the five options:

  • UNBOUNDED PRECEDING
  • n PRECEDING
  • CURRENT ROW
  • n FOLLOWING
  • UNBOUNDED FOLLOWING

No alt text provided for this image

The?lower_bound?must be BEFORE the?upper_bound.

No alt text provided for this image

Default Window Frame

  • If?ORDER BY ?is specified, then the frame is?RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
  • Without?ORDER BY , the frame specification is?ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.


Examples

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW : means that the frame includes all rows from the beginning of the partition up to and including the current row.

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: The frame includes the

current row and the row immediately before and after it.

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: The frame includes all rows in the partition.

ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING: The frame includes the

current row and the three rows before it and the two rows after it.

Analytic Window Functions

  • first_value(expr)?- The value for the first row within the window frame
  • last_value(expr)?- The value for the last row within the window frame

Let's try to find the first_value in our marks table i.e. find the top-performing student's name based on the highest obtained marks in descending order.

SELECT *,
? ? ? ?FIRST_VALUE(name) OVER(ORDER BY obtained_mark DESC) AS 'Topper'
FROM marks;        
No alt text provided for this image

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • FIRST_VALUE(name) OVER(ORDER BY obtained_mark DESC) AS 'Topper': This uses the FIRST_VALUE() window function to calculate the first value of the "name" column within the window ordered by obtained marks in descending order.
  • name is the column for which you want to find the top-performing student.
  • ORDER BY obtained_mark DESC specifies the ordering of rows based on obtained marks in descending order.
  • AS 'Topper': This assigns an alias name "Topper" to the calculated first value, which represents the name of the top-performing student.

Let's try to find the first_value in our marks table i.e. find the top obtained marks for each student based on the highest marks in descending order.

SELECT *,
? ? ? ?FIRST_VALUE(obtained_mark) OVER(ORDER BY obtained_mark DESC) AS 'Topper'
FROM marks
ORDER BY student_id;        


No alt text provided for this image

Find the lowest performing student's name based on the lowest obtained marks in descending order, considering all rows using the last_value window function.

SELECT *,
? ? ? ?LAST_VALUE(name) OVER(ORDER BY obtained_mark DESC?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'low_performer'
FROM marks
ORDER BY student_id;        
No alt text provided for this image

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • LAST_VALUE(name) OVER(ORDER BY obtained_mark DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'low_performer': This uses the LAST_VALUE() window function to calculate the last value of the "name" column within the window ordered by obtained marks in descending order. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause ensures that all rows are considered in the window.
  • name is the column for which you want to find the lowest performing student.
  • ORDER BY obtained_mark DESC specifies the ordering of rows based on obtained marks in descending order.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING specifies that the window includes all rows in the ordering, effectively considering the entire dataset.
  • AS 'low_performer': This assigns an alias name "low_performer" to the calculated last value, which represents the name of the lowest performing student.
  • ORDER BY student_id: This orders the final result set by student IDs for presentation.

Note:?You usually want to use?RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING?with?last_value(). With the default window frame for?ORDER BY,?RANGE UNBOUNDED PRECEDING,?last_value()?returns the value for the current row.

Find the lowest obtained marks for each student based on the lowest marks in descending order, considering all rows.

SELECT *,
? ? ? ?LAST_VALUE(obtained_mark) OVER(ORDER BY obtained_mark DESC?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'low_performer'
FROM marks
ORDER BY student_id;
        
No alt text provided for this image

  • nth_value(expr, n)?- the value for the?n-th row within the window frame;?n?must be an integer

No alt text provided for this image

Let's try to find the second topper in our marks table.

Uses the nth_value() window function to find the name of the second-top performing student based on the highest obtained marks in descending order, considering all rows.


SELECT *,
? ? ? ?NTH_VALUE(name, 2) OVER(ORDER BY obtained_mark DESC?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'second_topper'
FROM marks;        
No alt text provided for this image

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • NTH_VALUE(name, 2) OVER(ORDER BY obtained_mark DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'second_topper': This uses the NTH_VALUE() window function to calculate the name of the student in the second position within the window ordered by obtained marks in descending order. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause ensures that all rows are considered in the window.
  • name is the column for which you want to find the name of the second-top performing student.
  • 2 indicates that you want to find the value at the second position within the window.
  • ORDER BY obtained_mark DESC specifies the ordering of rows based on obtained marks in descending order.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING specifies that the window includes all rows in the ordering, effectively considering the entire dataset.
  • AS 'second_topper': This assigns an alias name "second_topper" to the calculated value, which represents the name of the student with the second-highest obtained marks.

Uses the nth_value() window function to find the name of the second-top performing student within each branch based on the highest obtained marks in descending order, considering all rows within each branch.


SELECT *,
? ? ? ?NTH_VALUE(name, 2) OVER(PARTITION BY branch ORDER BY obtained_mark DESC?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'branch_second_topper'
FROM marks;        
No alt text provided for this image

Explanation:

  • SELECT *: This part of the query selects all columns from the "marks" table.
  • NTH_VALUE(name, 2) OVER(PARTITION BY branch ORDER BY obtained_mark DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'branch_second_topper': This uses the NTH_VALUE() window function to calculate the name of the student in the second position within the window partitioned by branch and ordered by obtained marks in descending order. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause ensures that all rows within each branch are considered in the window.
  • name is the column for which you want to find the name of the second-top performing student.
  • 2 indicates that you want to find the value at the second position within the window.
  • PARTITION BY branch divides the result set into partitions based on the "branch" column.
  • ORDER BY obtained_mark DESC specifies the ordering of rows based on obtained marks in descending order within each partition.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING specifies that the window within each partition includes all rows, effectively considering the entire partition.
  • AS 'branch_second_topper': This assigns an alias name "branch_second_topper" to the calculated value, which represents the name of the student with the second-highest obtained marks within each branch.

Uses the nth_value() window function to find the name of the second lowest-performing student within each branch based on the lowest obtained marks in ascending order, considering all rows within each branch.

SELECT *,
? ? ? ?NTH_VALUE(name, 2) OVER(PARTITION BY branch ORDER BY obtained_mark?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'branch_second_low_performer'
FROM marks;        
No alt text provided for this image

Find the branch toppers (Name, marks, branch)

Write a SQL query is designed to find the branch toppers for each branch in the "marks" table. It uses the first_value() window function to identify the top-performing student and their obtained marks within each branch.

SELECT name, branch, obtained_mark
FROM (
? ? SELECT *,
? ? ? ? ? ?FIRST_VALUE(name) OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS 'branch_topper',
? ? ? ? ? ?FIRST_VALUE(obtained_mark) OVER(PARTITION BY branch ORDER BY obtained_mark DESC) AS 'branch_topper_marks'
? ? FROM marks
) AS t
WHERE t.branch_topper_marks = obtained_mark AND t.branch_topper = name;
        
No alt text provided for this image

Explanation:

  • The subquery:
  • It selects all columns from the "marks" table.
  • Uses the FIRST_VALUE() window function with the PARTITION BY branch clause to calculate the name of the top-performing student (branch_topper) within each branch.
  • Also uses the FIRST_VALUE() window function to calculate the obtained marks (branch_topper_marks) of the top-performing student within each branch.
  • This subquery creates a temporary result set with additional columns for the branch toppers' names and marks.
  • The outer query:
  • Selects the name, branch, and obtained_mark columns from the temporary result set created by the subquery.
  • The WHERE clause filters the results to only include rows where the branch_topper_marks are equal to the obtained_mark and the branch_topper is equal to the name. This ensures that only the top-performing students are selected for each branch.

You can get the same result by using rank(), dense_rank() and simply use of group by and max() function.


--USING GROUP BY
select t.name, t.branch, t.obtained_mark from marks t
join
(select branch, max(obtained_mark) as max_marks from marks
group by branch) m?
where t.obtained_mark = m.max_marks and t.branch = m.branch;

--USING RANK()
select * from
(select *,
rank() over(partition by branch order by obtained_mark desc) as topper_rank
from marks) t?
where topper_rank = 1;

--USING DENSE_RANK()

select name, branch, obtained_mark, avg_marks from
(select *,
dense_rank() over(partition by branch order by obtained_mark desc) as topper_rank,
avg(obtained_mark) over(partition by branch) as avg_marks
from marks) as t
where topper_rank = 1;

-- USING ROW_NUMNER
select * from
              (select *,
              row_number() OVER(partition by branch order by obtained_mark desc) as topper_rank
              from marks) as m 
where topper_rank=1        



SQL Window Functions Cheat Sheet
SQL Window Functions Cheat Sheet
No alt text provided for this image
SQL Window Functions Cheat Sheet


Pintu Kumar Kushwaha

?? Data Science & Business Analyst Intern @The Spark Foundation ?? Data Analyst | Excel | SQL | Tableau | Power BI | Python ?? | Snowflake?? |

1 年

In SQL, a window function is a type of function that performs a calculation across a set of rows that are related to the current row. It allows you to perform calculations on a "window" of rows, which can be defined by a specific range or partition. Window functions are typically used to calculate aggregated values or perform calculations that require access to multiple rows within a query result set. They are especially useful when you need to calculate running totals, rankings, or cumulative sums. The syntax for using a window function is as follows: ------------------------------------------------------------------------------- SELECT column1, column2, ..., window_function() OVER (PARTITION BY columnX ORDER BY columnY) FROM table_name; -------------------------------------------------------------------------------

Pintu Kumar Kushwaha

?? Data Science & Business Analyst Intern @The Spark Foundation ?? Data Analyst | Excel | SQL | Tableau | Power BI | Python ?? | Snowflake?? |

1 年

SQL Window Functions Basics Before we dive into?ranking functions in SQL, it’s worth noting that all analytical functions share some basic rules: The processing of the analytical function is done?after?the?GROUP BY?processing of the SQL statement and?before?the ordering of the returned rows. Analytic function execution organizes results into partitions and then computes functions over these partitions in a specified order. Execution of the analytical function occurs?after row filtering?(WHERE)?and grouping?(GROUP BY) so the calculations can be?performed on the?GROUP BY?results.

Pintu Kumar Kushwaha

?? Data Science & Business Analyst Intern @The Spark Foundation ?? Data Analyst | Excel | SQL | Tableau | Power BI | Python ?? | Snowflake?? |

1 年

OVER(ORDER BY) Unlike the ordinary?ORDER BY?used commonly in a?SELECT?statement, the?ORDER BY?clause in window functions does not order the rows in the output. Instead,?ORDER BY?here means the order of the data over which the window function is executed. OVER(PARTITION BY) The?PARTITION BY?clause in window functions is used to define how rows of data are divided into groups. It’s similar to the?GROUP BY?clause, but?PARTITION BY?does not collapse the individual rows when grouping the data. If you want to use a?PARTITION BY?clause, it has to be part of an?OVER()?clause. OVER(PARTITION BY ORDER BY) "OVER(PARTITION BY ORDER BY)" is a powerful SQL clause that allows us to perform calculations within specific partitions of your data and order those partitions. It's a great tool for advanced data analysis!

Pintu Kumar Kushwaha

?? Data Science & Business Analyst Intern @The Spark Foundation ?? Data Analyst | Excel | SQL | Tableau | Power BI | Python ?? | Snowflake?? |

1 年

Aggregate Functions avg(expr)?- average value for rows within the window frame count(expr)?- count of values for rows within the window frame max(expr)?- maximum value within the window frame min(expr)?- minimum value within the window frame sum(expr)?- sum of values within the window frame ORDER BY and Window Frame:?Aggregate functions do not require an?ORDER BY. They accept window frame definition (ROWS,?RANGE,?GROUPS).

Pintu Kumar Kushwaha

?? Data Science & Business Analyst Intern @The Spark Foundation ?? Data Analyst | Excel | SQL | Tableau | Power BI | Python ?? | Snowflake?? |

1 年

ntile(n)

  • 该图片无替代文字

要查看或添加评论,请登录

社区洞察

其他会员也浏览了