Quick Tip - High Performance Running Total
Douglas Day, Data Architect - Datalab

Quick Tip - High Performance Running Total

In this post we will illustrate a high performance method for calculating a running total by traversing a table index.

I’ve seen many examples of people using sub queries and CROSS JOIN’s in order to accomplish the same thing, the only problem with these is the overhead of joining so many rows together.?In this example I’ll illustrate how to traverse an index and create a running total at the same time.

We are going to use the AdventureWorksDW2008.?I am going to create a table that summaries the sales from the FactInternestSales table and then apply the running total.

First create the table that we will use to apply the running total, note that I've created a clustered index on this table. The clustered index is very important as we’ll us the index to ensure that the running total is created in the correct order.

USE AdventureWorksDW2008
GO?

IF OBJECT_ID('RunningTotal', N'U') IS NOT NULL?
? ? DROP TABLE RunningTotal;

CREATE TABLE RunningTotal?
(
? CustomerKey INT NOT NULL
? ,OrderDateKey INT NOT NULL
? ,SalesAmount money NOT NULL
? ,RunningTotal money NULL
)?

CREATE CLUSTERED INDEX idx1 ON RunningTotal (CustomerKey, OrderDateKey);
        

Next populate the table with summarised data from the FactInternestSales table.?Because a customer can have multiple orders on a day with multiple product we need to aggregate the data up to a single record for the day.


INSERT INTO RunningTotal
(? ?
? ?CustomerKey,
? ?OrderDateKey,
? ?SalesAmount
)
SELECT
	CustomerKey
	,OrderDateKey
	,SUM(SalesAmount) AS SalesAmount
FROM
	FactInternetSales WITH(NOLOCK)
GROUP BY
	CustomerKey
	,OrderDateKey
        

Now for the interesting bit.?T-SQL supports a method by which you can assign a value to a field and update a variable at the same time.

The calculation of the equation is calculated from right to left, so first we identify in the CASE statement if the CustomerKey on this record is the same as the CustomerKey on the previous record.?If they are the same then we take the current @RunningTotal value and add the SalesAmount to this.?This is then updated into the RunningTotal field on the table and then this intern updated the @RunningTotal variable with the new total, ready to be used in the next iteration of the update.?If the CustomerKeys are not the same then we know this is a new customers RunningTotal and so the first entry will be equal to the SalesAmount on that day.

Notice that we also update @PrevCustomerKey which will be used for comparison in the next iteration of the update.

The real trick here is to force the query to use the index, this way you ensure that the records are read and updated in the correct sequence.


UPDATE RunningTotal
SET @RunningTotal = RunningTotal =?
	CASE?
		WHEN (CustomerKey = @PrevCustomerKey)?
		THEN @RunningTotal + SalesAmount?
		ELSE SalesAmount?
	END
	,@PrevCustomerKey = CustomerKey?
FROM?
	RunningTotal WITH (INDEX(idx1), TABLOCKX);


SELECT?
	*?
FROM?
	RunningTotal WITH(NOLOCK)
ORDER BY
	CustomerKey
	,OrderDateKey
        

要查看或添加评论,请登录

datalab的更多文章

社区洞察

其他会员也浏览了