Ranking Functions in SQL Server(ANO27)

Ranking Functions in SQL Server(ANO27)

#sqlserver, #performancetuning, #sqlserverdeveloper


Ranking functions are in the window functions group ((functions that operate on a set of rows and return a single aggregated value for each row) that return a ranking value for each row in a partition.

Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.

In the SQL server, we have four ranking functions:


ROW_NUMBER:

This function numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

Syntax:

ROW_NUMBER ( )   

    OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause )        

  1. Over clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.
  2. PARTITION BY value_expressionDivides the result set produced by the FROM clause into partitions to which the ROW_NUMBER function is applied. value_expression specifies the column by which the result set is partitioned. If PARTITION BY is not specified, the function treats all rows of the query result set as a single group.
  3. The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified partition. It is required.

Example: Imagine we want to rank the salespersons in a company according to their total sales:

use AdventureWorks2019 
GO

 SELECT    P.FirstName AS SalePersonName
	,p.LastName AS SalePersonLastName
        ,[TotalDue]
	,ROW_NUMBER() OVER( ORDER BY ([TotalDue]) DESC) AS "Row Number"
 
FROM [AdventureWorks2019].[Sales].[SalesOrderHeader] SOH
   INNER JOIN Sales.SalesPerson AS s 
	ON SOH.SalesPersonID=S.BusinessEntityID
      INNER JOIN Person.Person AS p   
        ON s.BusinessEntityID = p.BusinessEntityID  
  WHERE [SalesPersonID] IS NOT NULL        

As we didn't determine PARTITION BY clause, the whole results are considered as a partition and the function ordered results according to the order by clause("TotalDue"), then set a number for each row in the result set.


Note: The ROW_NUMBER() function assigns a different rank to rows with similar values.

If we want to set a row number for each salesperson separately we can use partition by clause.

 SELECT   [SalesOrderID],[SalesPersonID],P.FirstName AS SalePersonName
		  ,p.LastName AS SalePersonLastName
		  ,[TotalDue]
		  ,ROW_NUMBER() OVER( partition by [SalesPersonID] ORDER BY ([TotalDue]) DESC) AS "Row Number"
 FROM [AdventureWorks2019].[Sales].[SalesOrderHeader] SOH
   INNER JOIN Sales.SalesPerson AS s 
	ON SOH.SalesPersonID=S.BusinessEntityID
      INNER JOIN Person.Person AS p   
        ON s.BusinessEntityID = p.BusinessEntityID  
  WHERE [SalesPersonID] IS NOT NULL        

Rank

This function returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.

ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5).

Syntax:

RANK ( ) OVER ( [ partition_by_clause ] order_by_clause )        

OVER ( [ partition_by_clause ] order_by_clause)partition_by_clause divides the result set produced by the FROM clause into partitions to which the function is applied. If not specified, the function treats all rows of the query result set as a single group. order_by_clause determines the order of the data before the function is applied. The order_by_clause is required. The <rows or range clause/> of the OVER clause cannot be specified for the RANK function.

Remarks

If two or more rows tie for a rank, each tied row receives the same rank. For example, if the two top salespeople have the same SalesYTD value, they are both ranked one. The salesperson with the next highest SalesYTD is ranked number three, because there are two rows that are ranked higher. Therefore, the RANK function does not always return consecutive integers.

The sort order that is used for the whole query determines the order in which the rows appear in a result set.

Example:

The following example ranks the products in inventory the specified inventory locations according to their quantities. The result set is partitioned by LocationID and logically ordered by Quantity.

SELECT i.ProductID, p.Name, i.LocationID, i.Quantity  
    ,RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank  
FROM Production.ProductInventory AS i   
INNER JOIN Production.Product AS p   
    ON i.ProductID = p.ProductID  
WHERE i.LocationID BETWEEN 3 AND 4  
ORDER BY i.LocationID;  
GO        

Notice that in this function we would have a gap(s) between numbers(rank) if two records have the same value. For example, products 494 and 495 have the same quantity. Because they are tied, they are both ranked one But product 493 is ranked 3 and we don't have rank 2.


DENSE_RANK

This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.

Syntax:

DENSE_RANK ( ) OVER ( [ <partition_by_clause> ] < order_by_clause > )        

<partition_by_clause> First divides the result set produced by the FROM clause into partitions, and then the DENSE_RANK function is applied to each partition.

<order_by_clause> Determines the order in which the DENSE_RANK function applies to the rows in a partition.

Example:

This example ranks the products in inventory, by the specified inventory locations, according to their quantities. DENSE_RANK partitions the result set by LocationID and logically orders the result set by Quantity.

GO  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity  
	,ROW_NUMBER() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS "ROW_NUMBER" 
	,RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS "Rank" 
    ,DENSE_RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS "DENSE_RANK"  
FROM Production.ProductInventory AS i   
INNER JOIN Production.Product AS p   
    ON i.ProductID = p.ProductID  
WHERE i.LocationID BETWEEN 3 AND 4  
GO        

In this example, I added ROW_NUMBER, RANK, and DENSE_RANK functions to better understand the difference between them.

Notice that for products 494 and 495, here we don't have a gap.


NTILE

Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

Syntax:

NTILE (integer_expression) OVER ( [ <partition_by_clause> ] < order_by_clause > )        

integer_expression Is a positive integer expression that specifies the number of groups into which each partition must be divided. integer_expression can be of type int, or bigint.

<partition_by_clause> Divides the result set produced by the FROM clause into partitions to which the function is applied.

<order_by_clause> Determines the order in which the NTILE values are assigned to the rows in a partition. An integer cannot represent a column when the <order_by_clause> is used in a ranking function.

Remarks

If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.

Examples:

A:

SELECT p.FirstName, p.LastName  
    ,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile  
    ,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD  
    , a.PostalCode  
FROM Sales.SalesPerson AS s   
INNER JOIN Person.Person AS p   
    ON s.BusinessEntityID = p.BusinessEntityID  
INNER JOIN Person.Address AS a   
    ON a.AddressID = p.BusinessEntityID  
WHERE TerritoryID IS NOT NULL   
    AND SalesYTD <> 0;  
GO
        

In this example, because the total number of rows is 14 and the number of groups is Four, the first two groups have 4 rows and the two remaining groups have three rows each.

B. Dividing the result set by using PARTITION BY

The following example adds the PARTITION BY argument to the code in example A. The rows are first partitioned by PostalCode and then divided into four groups within each PostalCode. The example also declares a variable @NTILE_Var and uses that variable to specify the value for the integer_expression parameter.

GO  
DECLARE @NTILE_Var INT = 4;  
  
SELECT p.FirstName, p.LastName  
    ,NTILE(@NTILE_Var) OVER(PARTITION BY PostalCode ORDER BY SalesYTD DESC) AS Quartile  
    ,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD  
    ,a.PostalCode  
FROM Sales.SalesPerson AS s   
INNER JOIN Person.Person AS p   
    ON s.BusinessEntityID = p.BusinessEntityID  
INNER JOIN Person.Address AS a   
    ON a.AddressID = p.BusinessEntityID  
WHERE TerritoryID IS NOT NULL   
    AND SalesYTD <> 0;  
GO
        



要查看或添加评论,请登录

社区洞察

其他会员也浏览了