Ranking Functions in SQL Server(ANO27)
Ranking functions are in the window functions group ((functions that operate on a set of rows and return a single aggregated value for each row) that return a ranking value for each row in a partition.
Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.
In the SQL server, we have four ranking functions:
ROW_NUMBER:
This function numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
Syntax:
ROW_NUMBER ( )
OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause )
Example: Imagine we want to rank the salespersons in a company according to their total sales:
use AdventureWorks2019
GO
SELECT P.FirstName AS SalePersonName
,p.LastName AS SalePersonLastName
,[TotalDue]
,ROW_NUMBER() OVER( ORDER BY ([TotalDue]) DESC) AS "Row Number"
FROM [AdventureWorks2019].[Sales].[SalesOrderHeader] SOH
INNER JOIN Sales.SalesPerson AS s
ON SOH.SalesPersonID=S.BusinessEntityID
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
WHERE [SalesPersonID] IS NOT NULL
As we didn't determine PARTITION BY clause, the whole results are considered as a partition and the function ordered results according to the order by clause("TotalDue"), then set a number for each row in the result set.
If we want to set a row number for each salesperson separately we can use partition by clause.
SELECT [SalesOrderID],[SalesPersonID],P.FirstName AS SalePersonName
,p.LastName AS SalePersonLastName
,[TotalDue]
,ROW_NUMBER() OVER( partition by [SalesPersonID] ORDER BY ([TotalDue]) DESC) AS "Row Number"
FROM [AdventureWorks2019].[Sales].[SalesOrderHeader] SOH
INNER JOIN Sales.SalesPerson AS s
ON SOH.SalesPersonID=S.BusinessEntityID
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
WHERE [SalesPersonID] IS NOT NULL
Rank
This function returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.
ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5).
Syntax:
RANK ( ) OVER ( [ partition_by_clause ] order_by_clause )
OVER ( [ partition_by_clause ] order_by_clause)partition_by_clause divides the result set produced by the FROM clause into partitions to which the function is applied. If not specified, the function treats all rows of the query result set as a single group. order_by_clause determines the order of the data before the function is applied. The order_by_clause is required. The <rows or range clause/> of the OVER clause cannot be specified for the RANK function.
Remarks
If two or more rows tie for a rank, each tied row receives the same rank. For example, if the two top salespeople have the same SalesYTD value, they are both ranked one. The salesperson with the next highest SalesYTD is ranked number three, because there are two rows that are ranked higher. Therefore, the RANK function does not always return consecutive integers.
The sort order that is used for the whole query determines the order in which the rows appear in a result set.
Example:
The following example ranks the products in inventory the specified inventory locations according to their quantities. The result set is partitioned by LocationID and logically ordered by Quantity.
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID;
GO
领英推荐
Notice that in this function we would have a gap(s) between numbers(rank) if two records have the same value. For example, products 494 and 495 have the same quantity. Because they are tied, they are both ranked one But product 493 is ranked 3 and we don't have rank 2.
DENSE_RANK
This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.
Syntax:
DENSE_RANK ( ) OVER ( [ <partition_by_clause> ] < order_by_clause > )
<partition_by_clause> First divides the result set produced by the FROM clause into partitions, and then the DENSE_RANK function is applied to each partition.
<order_by_clause> Determines the order in which the DENSE_RANK function applies to the rows in a partition.
Example:
This example ranks the products in inventory, by the specified inventory locations, according to their quantities. DENSE_RANK partitions the result set by LocationID and logically orders the result set by Quantity.
GO
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,ROW_NUMBER() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS "ROW_NUMBER"
,RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS "Rank"
,DENSE_RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS "DENSE_RANK"
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
GO
In this example, I added ROW_NUMBER, RANK, and DENSE_RANK functions to better understand the difference between them.
Notice that for products 494 and 495, here we don't have a gap.
NTILE
Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
Syntax:
NTILE (integer_expression) OVER ( [ <partition_by_clause> ] < order_by_clause > )
integer_expression Is a positive integer expression that specifies the number of groups into which each partition must be divided. integer_expression can be of type int, or bigint.
<partition_by_clause> Divides the result set produced by the FROM clause into partitions to which the function is applied.
<order_by_clause> Determines the order in which the NTILE values are assigned to the rows in a partition. An integer cannot represent a column when the <order_by_clause> is used in a ranking function.
Remarks
If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.
Examples:
A:
SELECT p.FirstName, p.LastName
,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile
,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD
, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0;
GO
In this example, because the total number of rows is 14 and the number of groups is Four, the first two groups have 4 rows and the two remaining groups have three rows each.
B. Dividing the result set by using PARTITION BY
The following example adds the PARTITION BY argument to the code in example A. The rows are first partitioned by PostalCode and then divided into four groups within each PostalCode. The example also declares a variable @NTILE_Var and uses that variable to specify the value for the integer_expression parameter.
GO
DECLARE @NTILE_Var INT = 4;
SELECT p.FirstName, p.LastName
,NTILE(@NTILE_Var) OVER(PARTITION BY PostalCode ORDER BY SalesYTD DESC) AS Quartile
,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD
,a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0;
GO