"Topic- SQL Functions"
Divya Manupati
Data Analytics | Business Intelligence | Microsoft Excel | SQL | Python| Hadoop | Microsoft Power BI | R programming | Tableau | Statistics | SPSS software | Redshift Db | Mobaxterm | Web 3 technology | Block chain |
Scalar Functions:
Scalar functions operate on a single input value and return a single output value. They can be used in SQL expressions, SELECT statements, WHERE clauses, and more. Scaler Functions includes String Function, Numeric Function, Date Function,
1) String functions:
Here are some String Functions in SQL:
These are some of the most commonly used string functions in SQL. They allow us to manipulate and extract information from string values in our queries.
2) Numeric Functions:
Here are some commonly used numeric functions in SQL:
#ABS:
#MOD:
#EXP:
These are some of the most commonly used numeric functions in SQL. They allow us to perform calculations, round numbers, calculate powers, and more within our queries.
3) Date Functions:
Here are some date functions in SQL:
#DAY:
These are some of the most commonly used date functions in SQL. They allow us to extract specific parts from dates, perform date calculations, and format dates according to our requirements.
4) Conversion Functions:
Here are some commonly used conversion functions:
#TO_CHAR (Oracle) / #DATE_FORMAT (MySQL):
#TO_DATE (Oracle) / #STR_TO_DATE (MySQL):
#TO_NUMBER (Oracle):
These are some of the most commonly used conversion functions in SQL. They allow us to convert data types, format dates, handle NULL values, and perform other data conversions as needed.
Aggregate Functions:
Aggregate functions operate on a set of values and return a single value as the result. They are typically used with the GROUP BY clause to perform calculations on groups of rows. Examples include COUNT, SUM, AVG, MIN, MAX.
#SUM:
#AVG:
#MIN:
#MAX:
These are some of the most commonly used aggregate functions in SQL. They allow us to perform calculations on groups of rows, count rows, calculate sums, averages, and find minimum and maximum values. The GROUP BY and HAVING clauses are used in combination with these functions to perform aggregations and filtering on grouped data.
Window Functions:
"Window functions, also known as analytic functions or windowing functions, are a powerful feature in SQL that perform calculations across a set of rows that are related to the current row, called a "window". A window function calculates a value for each row in the result set based on a subset of the rows that are defined by a window specification.
Window functions are used to perform advanced calculations and analysis without the need for subqueries or self-joins.
Window functions applies aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. OVER clause does three things:?
Note:?If partitions aren’t done, then ORDER BY orders "all" rows of table.
Concept of Frames:
A Frame in a window function is a subset of rows within the partition that determines the scope of the window function calculation. The frame is defined using a combination of two clauses in the window function: ROWS and BETWEEN.
The ROWS clause specifies how many rows should be included in the frame relative to the current row. For example, ROWS 3 PRECEDING means that the frame includes the current row and the three rows that precede it in the partition.
The BETWEEN clause specifies the boundaries of the frame.
Examples:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: means that the frame includes all rows from the beginning of the partition up to and including the current row.
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: the frame includes the current row and the row, immediately before and after it.
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: the frame includes all rows in the partition.
ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING: the frame includes the current row and the three rows before it and the two rows after it.
Benefits: Window functions offer several benefits, such as improved query readability, reduced need for subqueries or self-joins, and the ability to perform complex calculations in a single query.
Database support: Window functions are supported by many popular database management systems, including PostgreSQL, Oracle, SQL Server, and MySQL (starting from version 8.0).
For example, we can use a window function to calculate running totals, find top-n records within groups, or analyze trends over time.
"Here are some most commonly used window functions include ROW_NUMBER, RANK, DENSE_RANK, SUM, AVG, LEAD, LAG, and FIRST_VALUE/LAST_VALUE/NTH VALUE".
Aggregate function with OVER():
Aggregate functions with the OVER() clause are known as window or analytic aggregate functions. These functions allow us to perform aggregations on a subset of rows within a result set, defined by the window specification. The window specification is provided within the OVER() clause and includes partitioning, ordering, and framing options.
Here's the general syntax of using aggregate functions with the OVER() clause:
<aggregate_function>(<expression>) OVER (
[PARTITION BY <column(s)>]
[ORDER BY <column(s)>]
[ROWS/RANGE <frame_specification>]
)
Let's break down the different components of this syntax:
Here are a few examples of aggregate functions with the OVER() clause:
EXAMPLE-1: Calculate the total sales for each product within a specific category:
Assuming we have a table named "sales_table" with columns "product_category", "product_name", and "sales", the query:
SELECT product_category, product_name, sales,
SUM(sales) OVER (PARTITION BY product_category) AS total_sales
FROM sales_table;
The output will be: with an additional column "total_sales" that represents the sum of sales for each product category.
---------------------------------------------------------------------
product_category | product_name | sales | total_sales
----------------------------------------------------------------------
Electronics | TV | 100 | 450
Electronics | Laptop | 200 | 450
Electronics | Phone | 150 | 450
Clothing | T-Shirt | 50 | 150
Clothing | Jeans | 80 | 150
Clothing | Dress | 20 | 150
---------------------------------------------------------------------
In this example, the "total_sales" column represents the sum of sales for each product category. For the "Electronics" category, the total sales is 450, which is the sum of 100 + 200 + 150. Similarly, for the "Clothing" category, the total sales is 150, which is the sum of 50 + 80 + 20.
The PARTITION BY clause is used to divide the result set into partitions based on the values in the "product_category" column. The SUM() function is then applied over each partition to calculate the total sales for each product category.
EXAMPLE-2: Find the average salary for each department and rank employees based on their salary:
Assuming we have a table named "employees" with columns "department", "employee_name", and "salary", the query:
SELECT department, employee_name, salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;
The output will be: with two additional columns: "avg_salary" representing the average salary for each department, and "salary_rank" representing the rank of employees within each department based on their salary.
---------------------------------------------------------------------
department | employee_name | salary | avg_salary | salary_rank
---------------------------------------------------------------------
HR | John Smith | 50000 | 55000 | 2
HR | Jane Doe | 60000 | 55000 | 1
IT | Mike Johnson | 70000 | 65000 | 2
IT | Sarah Williams| 60000 | 65000 | 3
IT | David Lee | 80000 | 65000 | 1
---------------------------------------------------------------------
In this example, the "avg_salary" column represents the average salary for each department. For the "HR" department, the average salary is 55000, which is the average of 50000 and 60000. Similarly, for the "IT" department, the average salary is 65000, which is the average of 70000, 60000, and 80000.
The RANK() function is applied over each partition (department) and orders the employees based on their salary in descending order. The highest salary within each department receives a rank of 1, the next highest receives a rank of 2, and so on.
EXAMPLE-3: Calculate the cumulative revenue for each month, ordered by date:
Assuming we have a table named "sales_data" with columns "date" and "revenue", the query:
SELECT date, revenue,
SUM(revenue) OVER (ORDER BY date) AS cumulative_revenue
FROM sales_data;
The output will be: with an additional column "cumulative_revenue" that represents the cumulative revenue for each month, ordered by date.
---------------------------------------------------------------------
date | revenue | cumulative_revenue
---------------------------------------------------------------------
2022-01-01 | 1000 | 1000
2022-02-01 | 1500 | 2500
2022-03-01 | 2000 | 4500
2022-04-01 | 1200 | 5700
2022-05-01 | 1800 | 7500
---------------------------------------------------------------------
In this example, the "cumulative_revenue" column represents the sum of revenues from the beginning of the dataset up to the current date. The values in the "cumulative_revenue" column are calculated by using the SUM() function with the OVER clause, ordering the rows by the "date" column.
For example, on the first row with a date of "2022-01-01", the cumulative revenue is 1000, which is the sum of the revenue for that month. On the second row with a date of "2022-02-01", the cumulative revenue is 2500, which is the sum of the revenue for the previous month plus the revenue for the current month, and so on.
--------------------------------------------------------
RANK():
The RANK() function is a window function in SQL that assigns a rank to each row within a result set based on a specified criterion. It is commonly used to determine the ranking or ordering of rows based on a particular column or set of columns.
Here's the syntax for using the RANK() function:
RANK() OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
)
Let's understand each part of the syntax:
Here's an example that demonstrates the usage of the RANK() function:
Consider a table named "Sales" with columns "Product", "Revenue", and "Month". We can use the RANK() function to calculate the rank of each product based on their revenue within each month:
SELECT Product, Revenue, Month,
RANK() OVER (PARTITION BY Month ORDER BY Revenue DESC) AS Rank
FROM Sales;
The output will be: with an additional column "Rank" that represents the rank of each product based on their revenue within each month.
---------------------------------------------------------------------
Product | Revenue | Month | Rank
---------------------------------------------------------------------
Product A | 5000 | January | 1
Product B | 4000 | January | 2
Product C | 3000 | January | 3
Product D | 3500 | January | 4
Product X | 2000 | February | 1
Product Y | 1800 | February | 2
Product Z | 1500 | February | 3
---------------------------------------------------------------------
In this example, the "Rank" column represents the ranking of each product's revenue within each month. The products are ordered by their revenue in descending order within each month. The product with the highest revenue within each month receives a rank of 1, the next highest receives a rank of 2, and so on.
For example, in the month of January, "Product A" has the highest revenue and receives a rank of 1, while "Product C" has the lowest revenue and receives a rank of 3. In the month of February, "Product X" has the highest revenue and receives a rank of 1, while "Product Z" has the lowest revenue and receives a rank of 3.
Important Note: The RANK() function assigns a unique rank to each distinct value within the specified partition. If there are ties (i.e., multiple rows with the same value), the RANK() function will assign the same rank to all tied rows, and the subsequent rank will be skipped.
--------------------------------------------------------
DENSE_RANK()
The DENSE_RANK() function is a window function in SQL that assigns a dense rank to each row within a result set based on a specified criterion. It is similar to the RANK() function, but it handles ties differently. DENSE_RANK() ensures that there are no gaps in the ranks when multiple rows have the same ranking value.
Here's the syntax for using the DENSE_RANK() function:
DENSE_RANK() OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
)
Let's break down the components of the syntax:
Let's consider an example to illustrate the usage of the DENSE_RANK() function:
Suppose we have a table named "Students" with columns "Name", "Score", and "Subject". We want to calculate the dense rank of each student based on their score within each subject:
SELECT Name, Score, Subject,
DENSE_RANK() OVER (PARTITION BY Subject ORDER BY Score DESC) AS Dense_Rank
FROM Students;
The output will be: with an additional column "Dense_Rank" that represents the dense rank of each student based on their score within each subject.
---------------------------------------------------------------------
领英推荐
Name | Score | Subject | Dense_Rank
---------------------------------------------------------------------
John | 90 | Math | 1
Alice | 85 | Math | 2
Michael | 80 | Math | 3
Emily | 92 | Math | 1
David | 88 | Math | 2
Sara | 75 | Math | 3
Mark | 87 | Science | 1
Emma | 91 | Science | 2
Daniel | 85 | Science | 3
Sophia | 89 | Science | 1
Olivia | 90 | Science | 2
Jacob | 83 | Science | 3
---------------------------------------------------------------------
In this example, the "Dense_Rank" column represents the dense rank of each student's score within each subject. The students are ordered by their score in descending order within each subject. The student with the highest score within each subject receives a rank of 1, and in case of ties, the subsequent students receive the same rank without any gaps.
For example, in the "Math" subject, "John" has the highest score and receives a dense rank of 1, while "Emily" also has the highest score and receives the same dense rank of 1. The next highest score in the "Math" subject is "Alice" and "David", who both receive a dense rank of 2. The dense rank continues without any gaps for the remaining students.
Important Note: The DENSE_RANK() function assigns a unique rank to each distinct value within the specified partition. If there are ties (i.e., multiple rows with the same value), the DENSE_RANK() function will assign the same rank to all tied rows without leaving any gaps in the ranking sequence.
--------------------------------------------------------
ROW_NUMBER()
The ROW_NUMBER() function is a window function in SQL that assigns a unique sequential number to each row within a result set. It is commonly used to provide a unique identifier for each row, regardless of any specific criterion or ordering.
Here's the syntax for using the ROW_NUMBER() function:
ROW_NUMBER() OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
)
Let's understand each part of the syntax:
Let's consider an example to illustrate the usage of the ROW_NUMBER() function:
Suppose we have a table named "Employees" with columns "EmployeeID", "FirstName", "LastName", and "Department". We want to assign a unique number to each employee, regardless of any specific ordering or criteria:
SELECT EmployeeID, FirstName, LastName, Department,
ROW_NUMBER() OVER (ORDER BY EmployeeID) AS RowNumber
FROM Employees;
The output will be: with an additional column "RowNumber" that represents a unique number assigned to each employee.
---------------------------------------------------------------------
EmployeeID | FirstName | LastName | Department | RowNumber
---------------------------------------------------------------------
1 | John | Smith | HR | 1
2 | Jane | Doe | IT | 2
3 | Mike | Johnson | Sales | 3
4 | Sarah | Williams | HR | 4
5 | David | Lee | IT | 5
6 | Emily | Brown | Sales | 6
---------------------------------------------------------------------
In this example, the "RowNumber" column represents a unique number assigned to each employee. The employees are ordered by their "EmployeeID" in ascending order, but it could be ordered by any other column or criteria based on your requirement.
The ROW_NUMBER() function is used without any partitioning. It assigns a unique number to each row in the result set, based on the ordering specified in the ORDER BY clause. The resulting "RowNumber" column will contain a sequential number for each employee.
The ROW_NUMBER() function ensures that each row receives a distinct number, without any gaps or duplicates. It provides a simple way to generate unique identifiers for rows in a result set.
It's important to note that the sequential numbers assigned by the ROW_NUMBER() function are not persistent or meaningful beyond the current result set. They are recalculated each time the query is executed and can vary depending on the ordering specified.
--------------------------------------------------------
FIRST_VALUE()
The FIRST_VALUE() function is a window function in SQL that returns the value of a specified expression from the first row within a window frame. It allows us to retrieve the value of a particular column or expression for the first row in a defined window.
Here's the syntax for using the FIRST_VALUE() function:
FIRST_VALUE(<expression>) OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
[ROWS <frame_specification>]
)
Let's break down the components of the syntax:
FIRST_VALUE(<expression>): This is the expression or column whose value we want to retrieve from the first row within the window frame.
OVER (): This clause is used to define the window over which the FIRST_VALUE() function operates.
PARTITION BY <column(s)>: This optional clause divides the result set into partitions or groups based on one or more columns. The FIRST_VALUE() function retrieves the value independently for each partition. If omitted, the entire result set is treated as a single partition.
ORDER BY <column(s)>: This optional clause specifies the column(s) based on which the rows are ordered. It determines the sequence in which the FIRST_VALUE() function retrieves the value.
ROWS <frame_specification>: This optional clause defines the subset of rows within each partition to be included in the window. It specifies the start and end points of the frame relative to the current row.
Let's consider an example to illustrate the usage of the FIRST_VALUE() function:
Suppose we have a table named "Products" with columns "ProductID", "ProductName", "Category", and "Price". We want to retrieve the first product name within each category based on the ascending order of price, the query:
SELECT ProductID, ProductName, Category, Price,
FIRST_VALUE(ProductName) OVER (PARTITION BY Category ORDER BY Price) AS FirstProduct
FROM Products;
The output will be: with an additional column "FirstProduct" that represents the first product name within each category based on the price. Here's an example of the expected output:
---------------------------------------------------------------------
ProductID | ProductName | Category | Price | FirstProduct
---------------------------------------------------------------------
1 | Laptop | Electronics | 1000 | Laptop
2 | Smartphone | Electronics | 800 | Laptop
3 | Headphones | Electronics | 200 | Laptop
4 | T-Shirt | Clothing | 20 | T-Shirt
5 | Jeans | Clothing | 50 | T-Shirt
6 | Sneakers | Clothing | 80 | T-Shirt
7 | Book | Books | 15 | Book
8 | Notebook | Books | 5 | Book
9 | Pen | Books | 2 | Book
---------------------------------------------------------------------
In this example, the "FirstProduct" column represents the first product name within each category based on the price. The products are partitioned by the "Category" column and ordered by the "Price" column in ascending order.
For example, within the "Electronics" category, "Laptop" has the lowest price and is assigned as the first product. Similarly, within the "Clothing" category, "T-Shirt" has the lowest price and is assigned as the first product. And within the "Books" category, "Book" has the lowest price and is assigned as the first product.
It's important to note that the FIRST_VALUE() function only retrieves the value from the first row within the window frame and does not perform any aggregation or calculation.
----------------------------------------------------------
LAST_VALUE()
The LAST_VALUE() function is a window function in SQL that returns the value of a specified expression from the last row within a window frame. It allows us to retrieve the value of a particular column or expression for the last row in a defined window.
Syntax: Same as FIRST_VALUE, the only difference is we use LAST_VALUE in the place FIRST_VALUE.
Let's consider an example to illustrate the usage of the LAST_VALUE() function:
Suppose we have a table named "Orders" with columns "OrderID", "CustomerID", "OrderDate", and "TotalAmount". We want to retrieve the last order date for each customer based on the descending order of total amount:
SELECT OrderID, CustomerID, OrderDate, TotalAmount,
LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY TotalAmount DESC) AS LastOrderDate
FROM Orders;
The output will be: with an additional column "LastOrderDate" that represents the last order date for each customer based on the total amount.
---------------------------------------------------------------------
OrderID | CustomerID | OrderDate | TotalAmount | LastOrderDate
---------------------------------------------------------------------
1 | C001 | 2022-01-01 | 1000 | 2022-03-15
2 | C001 | 2022-02-15 | 800 | 2022-03-15
3 | C002 | 2022-03-10 | 1200 | 2022-03-10
4 | C002 | 2022-02-28 | 1500 | 2022-03-10
5 | C002 | 2022-03-15 | 1000 | 2022-03-10
6 | C003 | 2022-04-05 | 500 | 2022-04-05
---------------------------------------------------------------------
In this example, the "LastOrderDate" column represents the last order date for each customer based on the total amount. The orders are partitioned by the "CustomerID" column and ordered by the "TotalAmount" column in descending order.
For example, for the customer with ID "C001", the last order date is "2022-03-15" because it corresponds to the order with the highest total amount within that customer's orders. Similarly, for the customer with ID "C002", the last order date is "2022-03-10" because it corresponds to the order with the highest total amount within that customer's orders. For the customer with ID "C003", the last order date is "2022-04-05" because there is only one order for that customer.
----------------------------------------------------------
NTH_VALUE()
The NTH_VALUE() function is a window function in SQL that returns the value of a specified expression from the nth row within a window frame. It allows us to retrieve the value of a particular column or expression from a specific position in the window frame.
Syntax: similar with FIRST_VALUE and LAST_VALUE.
NTH_VALUE(<expression>, n) OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
[ROWS <frame_specification>]
)
Let's break down the components of the syntax:
Let's consider an example to illustrate the usage of the NTH_VALUE() function:
Consider a table named "Sales" with columns "Product", "Revenue", and "Month". We want to retrieve the nth highest revenue for each product within each month. Let's assume we want to retrieve the 3rd highest revenue. The query,
SELECT Product, Revenue, Month,
NTH_VALUE(Revenue, 3) OVER (PARTITION BY Month ORDER BY Revenue DESC) AS ThirdHighestRevenue
FROM Sales;
The output will show the product, revenue, month, and the 3rd highest revenue for each product within each month:
---------------------------------------------------------------------
Product | Revenue | Month | ThirdHighestRevenue
---------------------------------------------------------------------
A | 100 | Jan | 80
B | 150 | Jan | 80
C | 80 | Jan | 80
D | 200 | Jan | 80
A | 120 | Feb | 100
B | 80 | Feb | 100
C | 100 | Feb | 100
D | 100 | Feb | 100
---------------------------------------------------------------------
In this example, the "ThirdHighestRevenue" column represents the 3rd highest revenue for each product within each month. The revenue values are partitioned by the "Month" column and ordered by the "Revenue" column in descending order. The NTH_VALUE() function is used to retrieve the value of the revenue from the 3rd row within that ordering.
----------------------------------------------------------
LAG()
The LAG() function is a window function in SQL that allows us to access the value of a column from a previous row within a window frame. It is used to retrieve the value of a specified expression from a row that precedes the current row based on a specified ordering.
Here's the syntax for using the LAG() function:
LAG(<expression>, offset, default_value) OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
)
Let's break down the components of the syntax:
Let's consider an example to illustrate the usage of the LAG() function:
Suppose we have a table named "Sales" with columns "OrderDate" and "Revenue". We want to retrieve the revenue from the previous order date for each row, ordered by the order date, the query;
SELECT OrderDate, Revenue,
LAG(Revenue, 1, 0) OVER (ORDER BY OrderDate) AS PreviousRevenue
FROM Sales;
The output :
The LAG() function is used to retrieve the previous revenue for each order based on the order date. The result set will include the OrderDate, Revenue, and an additional column "PreviousRevenue" that represents the revenue from the previous order.
---------------------------------------------------------------------
OrderDate | Revenue | PreviousRevenue
---------------------------------------------------------------------
2021-01-01 | 1000 | 0
2021-01-02 | 1500 | 1000
2021-01-03 | 1200 | 1500
2021-01-04 | 1800 | 1200
---------------------------------------------------------------------
In this example, the "PreviousRevenue" column represents the revenue from the previous order based on the order date. The revenue values are ordered by the "OrderDate" column in ascending order.
For example, on January 1, 2021, the revenue is 1000, and since there is no previous order, the "PreviousRevenue" is 0. On January 2, 2021, the revenue is 1500, and the previous order's revenue was 1000. Similarly, on January 3, 2021, the revenue is 1200, and the previous order's revenue was 1500. This pattern continues for each row in the result set.
---------------------------------------------------------
LEAD()
The LEAD() function is a window function in SQL that allows us to access the value of a column from a following row within a window frame. It is used to retrieve the value of a specified expression from a row that comes after the current row based on a specified ordering.
Here's the syntax for using the LEAD() function:
LEAD(<expression>, offset, default_value) OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
)
Let's break down the components of the syntax:
Let's consider an example to illustrate the usage of the LEAD() function:
Suppose we have a table named "Employees" with columns "EmployeeID", "FirstName", "LastName", and "Salary". We want to retrieve the salary from the following row for each employee, ordered by the employee ID, the query;
SELECT EmployeeID, FirstName, LastName, Salary,
LEAD(Salary, 1, 0) OVER (ORDER BY EmployeeID) AS NextSalary
FROM Employees;
The output:
The LEAD() function used to retrieve the next salary for each employee based on their EmployeeID. The result set will include the EmployeeID, FirstName, LastName, Salary, and an additional column "NextSalary" that represents the salary of the next employee.
---------------------------------------------------------------------
EmployeeID | FirstName | LastName | Salary | NextSalary
---------------------------------------------------------------------
1 | John | Smith | 5000 | 6000
2 | Jane | Doe | 6000 | 7000
3 | Michael | Johnson | 7000 | 0
---------------------------------------------------------------------
In this example, the "NextSalary" column represents the salary of the next employee based on the EmployeeID. The salary values are ordered by the "EmployeeID" column in ascending order.
For example, for the employee with EmployeeID 1, the salary is 5000, and the next employee (EmployeeID 2) has a salary of 6000. Similarly, for the employee with EmployeeID 2, the salary is 6000, and the next employee (EmployeeID 3) has a salary of 7000. Since there is no next employee after EmployeeID 3, the "NextSalary" is 0.
------------------------------------------------------
NTILE():
The NTILE() function is a window function in SQL that divides the result set into a specified number of equal-sized groups. It assigns a group number to each row based on the specified number of groups. The NTILE() function is particularly useful when you want to distribute rows evenly into a given number of buckets.
Here's the syntax for using the NTILE() function:
NTILE(number_of_groups) OVER (
[PARTITION BY <column(s)>]
ORDER BY <column(s)>
)
Let's break down the components of the syntax:
Let's consider an example to illustrate the usage of the NTILE() function:
Suppose we have a table named "Students" with columns "StudentID", "Name", and "Score". We want to divide the students into three groups based on their scores, ordered by the score in descending order the query;
SELECT StudentID, Name, Score,
NTILE(3) OVER (ORDER BY Score DESC) AS GroupNumber
FROM Students;
This will assign a group number to each student based on their score. The students will be divided into three equal-sized groups, ordered by the score in descending order.
---------------------------------------------------------------------
StudentID | Name | Score | GroupNumber
---------------------------------------------------------------------
1 | John | 95 | 1
2 | Sarah | 90 | 1
3 | Emma | 85 | 1
4 | Michael | 80 | 2
5 | Jessica | 75 | 2
6 | David | 70 | 2
7 | Emily | 65 | 3
8 | James | 60 | 3
9 | Olivia | 55 | 3
---------------------------------------------------------------------
In this example, the "GroupNumber" column represents the group to which each student belongs. The students are divided into three groups based on their score, with Group 1 having the highest scores, Group 2 having the middle scores, and Group 3 having the lowest scores. The students within each group are ordered by their score in descending order.
The NTILE() function evenly distributes the students into the specified number of groups based on the order of the scores. If the total number of students is not evenly divisible by the specified number of groups, some groups may have one more student than others.
Differences between the window functions we discussed:
RANK() vs DENSE_RANK():
RANK() vs ROW_NUMBER():
RANK() vs NTILE():
LAG() vs LEAD():
FIRST_VALUE() vs. LAST_VALUE():
NTH_VALUE() vs. FIRST_VALUE() / LAST_VALUE():
Each of these window functions serves different purposes and offers unique capabilities for analyzing and manipulating data within a window frame. The choice of which function to use depends on the specific requirements and desired results of your analysis or query.
CONCLUSION:
These are some of the most commonly used window functions in SQL. They are used to perform calculations across a set of rows that are related to the current row, such as assigning row numbers, ranks, accessing values from previous or following rows, and dividing result sets into groups. The PARTITION BY clause is used to specify the partitioning criteria, and the ORDER BY clause determines the order within each partition.