Window Functions in SQL
A window function
Let's suppose we need to Find the Department wise highest salary. The basic way is to use Group By to fetch the required data.
select dept_name, max(salary) as "Max Salary in each Dept"
from employee group by dept_name.
As you can see in the below output, the number of rows has been reduced to a number of departments in the table.
What if we want all the information along with the highest salary in each department?? Here We can use Window Functions.
select *,
max(salary) over(partition by dept_name order by salary desc)
as "Max Salary in each Dept" from employee
A simple way to create a window frame is by using an OVER clause with a PARTITION BY sub-clause
Types of Window Functions:
select *,
row_number() over(partition by dept_name order by salary desc) as rn,
rank() over(partition by dept_name order by salary desc) as rnk,
dense_rank() over(partition by dept_name order by salary desc) as "dense_rnk"
from employee,
4. Lead( ) / Lag( ): Lead and lag are used to compare a row with another row based on a particular column. Lead is used to compare the row following the current row, and lag is used to compare the row preceding the current row.
Lead and Lag both take 3 arguments first being the column, for comparison second being the number of rows following/preceding the current row. and third, is the default value instead of [null].
select *,
lead(salary,1,0) over(partition by dept_name order by emp_id) as "Lead",
lag(salary,1,0) over(partition by dept_name order by emp_id) as "Lag"
from employee
Window Frame Bounds With ROWS or RANGE
A very interesting feature of the OVER clause is the ability to specify the upper and lower bounds of a window frame. These bounds can be specified by using one of the two subclauses in the OVER clause: ROWS or RANGE
FRAME: It is a subset of Partition. Whenever you are using a window function most important thing is, it creates a window/ partition and it applies the window function to each of these partitions. Frame Clause is useful while using aggregate functions like AVG(), MIN(), etc.
领英推荐
The default frame used by many of the query languages is range between unbounded preceding and current row. Now, what is unbounded preceding? It is the first row of the partition whereas the current row is the row currently pointing to.
5. FIRST_VALUE( ): To extract a value/column value from the very first record within the Partition.
6. LAST_VALUE( ): To Extract a value/column value from the last record within the partition. To use Last_Value make sure you are using Frame Clause as range between unbounded preceding and unbounded following.
7. NTH_VALUE( ): To Extract a value/column value from the Nth record within the partition.
NOTE: First_Value, Last_Value, and Nth_Value, all take an argument which is the value to be displayed in the column. Nth_Value takes one more additional argument which is the value of N and N must be a positive integer.
select *,
first_value(emp_name) over(partition by dept_name order by salary desc)
as "Employee with highest salary" ,last_value(emp_name) over(partition by dept_name order by salary desc range between unbounded preceding and unbounded following) as
"Employee with lowest salary"
from employee,
8. NTILE( ): It is used to group data together within the partition and then place them into certain buckets. SQL makes sure that each bucket has an equal number of records. NTILE takes a positive integer as an argument which tells the number of buckets needed to divide the table into.
Let's say we want to divide the table into 3 parts Top Performers, High Performers, and Average Performers based on the Salary in each Department.
select emp_name, dept_name, a.bucket case when a.bucket=1 then 'Top performer'when a.bucket=2 then 'High performer'when a.bucket = 3 then 'Average Performer'
?else 'Invalid bucket' end as "Ranking" from ( select *, ntile(3) over (partition by dept_name order by salary desc) as bucket
from employee ) a
9. CUME_DIST( ): It is a cumulative distribution function
For Example: To find Employees constituting the first 30% of all Employees based on Salary.
10. PERCENT_RANK( ): It is used to find the relative rank of each row
For Example: To find the Top 10% of employees based on salary.
select *, case when a.rank_percentage >10 then 'Yes else 'No' end as 'In Top 10',
case when a.cummulative_dist >30 then 'Yes' else 'No' end as 'Constituting >30 %' from ( select *,
round(cume_dist() over(partition by dept_name order by salary) *100,2)
as cummulative_dist,
round(percent_rank() over(partition by dept_name order by salary) * 100,2)
as rank_percentage from employee ) a
SUMMARY
That's all for Windows Function. Follow for more such articles on SQL Topics. ???
Backend Developer
1 年Thanks. It helps me a lot
AVP - Analytics | Synchrony
2 年Useful share ????
Sr. Data Engineer @ NTT DATA| Azure | Databricks | Python | Pyspark | Big Data
2 年Superb content ??
Power Platform Developer | SQL | Power BI | Excel | Macros | Power Query | Certified in Cybersecurity (CC) from ISC2
2 年Great!!!
SSE @ Paytm | Ex-SDE II (R&D) @ Zuora | Ex-Optum(UHG) | Java | Spring Boot | Micro Services
2 年Shreyam Kesarwani