SQL Server Key Lookup & Scan
SQL Server Key Lookup & Scan, Plan vs. Actual Costs
Previously in?Plan Cost Cross-Over, we looked at the SQL Server execution plan Key Lookup to Scan cross over. This occurs around 41,089 rows for a 1GB table. Here we will look at Key Lookup and Scan cost, both plan and actual across a broad range of rows. All are at degree of parallelism 1.
Test Tables and Queries
The previous Plan Cost Cross-Over article had a table definition and population script. In all there were four tables each having a bigint primary key clustered and 15 int columns each with different data distribution between column and uniform distribution within a column. In all, there are 60 different distributions ranging from 64 rows per value to 873,819 at the high end (or plus 1 when the total rows in the table was not an exact integer multiple of the distribution count).
Most queries tested were in the form of the first query below. The FORCESEEK hint was applied to use the appropriate nonclustered index which in turn required key lookups. The INDEX(0) hint forces a table scan. The MAXDOP 1 hint forces degree of parallelism 1.
Multiple executions on a single thread were run to generate between 3 to 10 CPU-sec of worker time. This could range from 40960 iterations for the low distribution value and 10 for the high values. The average CPU is obtained from the DMV?sys.dm_exec_query_stats?column?total_worker_time. The single thread test is not a insightful as a full system saturation test using multiple threads, but that requires more effort.
Certain tests were run with either the ROWLOCK or NOLOCK table hints applied even though they are more directive than hint (like a hint from your spouse).
Plan Cost
The chart below shows the SQL Server cost for the Index Seek with Key Lookup and Scan plans from 64 to 873,819 rows. As before, the table(s) has 129,775 leaf level data pages, just under 1GB binary, which would be 131,072 pages, all pages having 101 rows for 13,107,275 rows total.
The scan cost is nearly flat, with the major contribution from the scan I/O cost of 96.132, corresponding to 0.003125 + (129,775-1) ÷ 1350. The scan CPU cost is 0.0001581 + (rows-1) × 0.0000011 = 14.41816.
The Stream Aggregate cost structure seems to be 0.0000006 + 0.0000005 per filtered row + 0.00000048 per row in the Scan operator. That the Stream Aggregate cost depend on the Scan operator is somewhat perplexing because the Scan has a predicate that filters the rows going to the next operation. Perhaps Microsoft would be so kind as to share the source code for the execution of a scan feeding into a stream aggregate? The total plan cost for the scan ranges from 116.8417 for 65 rows filtered to 117.0707 for 873,819 rows filtered. a
Below is the same plan cost versus rows on a log-log scale. Notice the vertical scale is log base 10 while the horizontal scale is log base 2. The log scale allows better view across the range such that the lower values are not compressed into a very narrow range.
A curiosity in the Index Seek plus Key Lookup plan cost is the "non-smooth" anomaly between 67,216 rows and 104858 rows. The exact range may be broader on both sides as these are the data points that I have. The range must be inside 65536 and 109,227 rows, however. The location and breath of this anomaly may be dependent on setting for?max server memory (MB). Most plan costs shown here are at max server memory set to 8GB. The plan cost may be the same for 16GB.
What appears to be happening in this range is that the key lookup I/O cost is fixed at 163.84, corresponding to 52,428.8 times 0.003125. Note that the plan detail for the Key Lookup operation, as shown below, only shows the single execute value of CPU Cost (0.0001581) and I/O Cost (0.003125), along with the Number of Executions (67217) and the complete Operator Cost (174.572)
The total I/O contribution to Operator Cost is not shown. I believe the calculation of Operator Cost is:
?CPU Cost × Number of Executions + I/O Cost × Number of I/Os
The Number of I/Os is the Number of Executions × the probability that a given Key Lookup pages was not previously accessed in this query or accessed but evicted. Hence the Number of I/Os is less than or equal to the Number of Executions.
At higher key lookup counts, the reduction in I/O may come from the probability that after a particular lookup, the next one or more lookups accesses the same leaf level page in the main table.
In the range of the anomaly, the Key Lookup I/O contribution is flat, though the Lookup Operator Cost increases from the CPU contribution and the plan cost also increases from the Index Seek operator.
I call this an anomaly because there is no physical reason for it to exist. If I had to guess, I would say the Key Lookup is comprised of formulas for each of separate regions that meet at the boundary and one of the formulae (for the lower half) has a cap.
Aside from the anomaly, the reasonable interpretation is: at lower Key Lookup executions, the number of I/O should be nearly linear with the number of executions, as there is low probability a page in the main table was previously accessed. At higher number of executions, this probability increases.
In this example, the nonclustered index used in the index seek portion of the plan has the key column defined on a single column. The actual key of a nonclustered index implicitly includes the columns of the clustered index key. For example, the full key of a nonclustered index on column G7ID is G7ID followed by ID which is the clustered index key.
The query in this case specifies that column G7ID equals some valid value. This means that all values of G7ID are stored in order of ID. However, SQL Server does not know the gap between successive ID values. Our table was populated in a manner to put maximum separation between successive rows. The table has 129,775 pages. When the number of equal rows is less than or equal to 129,775, successive rows are guaranteed to be in separate leaf level pages.
Presumably, SQL Server employs a formula to determine the probability that successive rows are in the same page, and then calculates the number of I/O with this effect factored in.
Below is the derived Key Lookup I/O component versus rows, expressed in multiples of 0.003125. We can interpret this as the computed number of Lookups that require I/O.
Below is the percent of Key Lookups that require I/O. For example, if there are 60,000 Lookups, and the I/O cost is 48,000 × 0.003125, then 80% of Lookups require I/O.
领英推荐
It is clear that the percentage of Lookups begins to fall mildly well before the number of pages in the main table (129,775) and then falls more sharply above 65K rows.
Actual Cost - Data in Buffer Pool
Now that we have covered plan cost, the question is: what is actual cost and how much or little does it resemble plan cost?
Below is the actual cost for Key Lookup and Scan versus rows for our test table of 129,775 pages (just under 1GB) and 13,107,275 rows. The vertical scale is in CPU-ms. Elapsed time should be very close to CPU (worker) time as all data is in memory.
Below is the same chart in log-log, base 10 for the vertical scale and base 2 for horizontal scale.
It is evident nothing unusual happens in the 67-105K row (number of executions) range. Hence the anomaly in the plan cost is just that, possibly a deficiency in matching both the value and slope of two formulae.
Below is the actual cost in CPU-μs of Index Seek with Key Lookup query on a per row basis.
A discontinuity in slope occurs between 4K and 16K rows. While the individual data points are no marked for clarity, there are data points at 4,096, 8,192 and 16,384 rows. The change in behavior in the 4K to 16K range could be the locks escalation point at 5,000 rows? See?The Lock Escalation Threshold?- Part I by Paul White. There are also?Part II?and?Part III?in the series.
Below are the actual query with Lookup cost (CPU-μs) per row for cases: no table hint, ROWLOCK and NOLOCK. The shape of the curves on the left at the lower row counts just means there is a fixed cost plus an incremental cost.
In curve fitting the (cost ? base) per row, a good fit is for base cost 40μs. If we describe the cost of first key lookup execution as having a initial cost plus cost per execution. The base cost could be interpreted as the cost of the first key lookup minus the first row incremental cost.
A simple eyeball fit shows that by subtracting 40μs as fixed cost, then the cost per row is as shown in the chart below. Note: the original set of four tables had data points at 4,096, 8,192 and 16,384 rows. As it is clear something is happening between 4K and 16K rows a new table was created to support additional data points in this range.
It is possible the behavior of default and ROWLOCK costs between 6K and 16K rows could be explained as: the cost is about 2.35μs up to 6K rows, then the cost of incremental rows beyond 6K is lower, similar to the NOLOCK Lookup cost. However, the quality of the measurements are not sufficiently consistent make this assertion based on visual matching. (I no longer do sophisticated curve fitting.)
For default (no table hint) and ROWLOCK, the Lookup cost is about 2.35μs per row until 4K rows, after which cost per row falls off to 1.65μs. At this point the behavior between default and ROWLOCK changes, in drifting lower at different rates.
The NOLOCK Lookup cost is about 1.6μs at lower rows, falling to 1.4 at 4K and to 1.3 at 40K rows. The ROWLOCK behavior becomes very close at above 128K rows. This may mean that SQL Server is taking a table lock. Why the default Lookup cost does not fall in a matching manner is unclear.
Discussion
Below is the combined Plan (dotted line) and Actual (solid line) cost for Lookup and Scan versus rows. The vertical scale for Plan cost is read on the left and the Actual cost is read on the right vertical scale in CPU-ms.
Sidenote
In the earlier SQL Server documentation under Cost Threshold for Parallelism, it was stated that the plan cost was an estimated?elapsed time in seconds?to run on a specific hardware configuration. The earlier document may have been removed, but there are direct quotes from it, example:?Cost of threshold of parallelism for estimated plan. I have heard that the specific system was an early 1990's era Pentium. There were multi-processor systems in the 80486 and Pentium periods but these became more common with the mid to late-90's Pentium Pro.
The current documentation for?Configure the cost threshold for parallelism Server Configuration Option?states "The cost refers to an estimated cost required to run the serial plan on a specific hardware configuration, and?is not a unit of time." Emphasis is mine.
Coming back to the topic of plan versus actual cost, the plan cost between key lookup and scan crosses at about 41,089 rows or 31.66 Key Lookups per 100 pages scanned. The actual Key Lookup to Scan crosses at somewhat over 524,000 rows. For data in buffer pool, this is off by a factor of more than 10. The plan cost is not entirely far off for data not in memory. But, the more/most common use of SQL Server is with data in memory.
A very time long ago, buffer cache hit ratio was an important metric that people looked at. As system memory grew larger, this value started to be over 99% if not 99.9%. At this point it was conceded that we do not really need to watch buffer cache hit any more. Page Life Expectancy was substituted for those who could not let go of the need to watch some performance counter that was not too complicated?
The important point to note here is that the SQL Server query optimizer will switch from index seek with lookup to scan far sooner than it should. The consequences can be bad if not very bad. This is also why the Missing Indexes DMV keeps recommending indexes with Include columns. The Include list can be used to make the key lookup operation go away. Disregarding the additional space (both on storage and in memory), the extra include columns could incur additional overhead for update statements.
Summary
It is clear that the SQL Server execution plan cost formula does not model the cost of operations for data in the buffer pool. The model may be appropriate for data not in buffer pool, requiring substantial I/O. Even so, the very large majority of SQL Server use is against data in memory, which is why people no longer talk watch Buffer Cache Hit Ratio, and even Page Life Expectancy is not talked about much anymore.
There must be different cost models between data in memory and not in memory. But there is no easy way to determine which to use, and when a plan based on one should be switched to the other.
For the moment, my proposal is that the use case of Adaptive Joins be expanded from just threshold row to whether I/O is encountered. My preference is that the default plan be based on a data in memory cost model. If disk I/O is encountered beyond some threshold (early in execution), the switch plans to one better suited to accessing data not in memory.