Insert, Update and Delete Tricks

Insert, Update and Delete Tricks

The previous articles?Insert, Update and Delete Plan Cost,?Asynchronous IO and Storage Arrays?and?IUD Performance Anomalies?in SQL Server combined described a set of circumstances in which anomalies can occur in write (IUD) queries. In brief, the anomaly can occur when there are many rows to be written, but the estimated number of rows is below one of the set points for asynchronous I/O, and many pages in nonclustered indexes that need to be modified are not in buffer cache at the beginning of the query.

We might argue that recent generation systems allow the possibility of massive memory capacity and stupendous I/O capability all at reasonable cost. But the chasm between what is possible and what is actually deployed is equally great. Many deployments are in the cloud, where people in the infrastructure silo seem to think that 4-8GB memory per vCore is adequate. The greater issue is the infatuation with network storage resulting in deeply throttled (crippled) I/O.

It is true that much lower I/O latency is possible with SSD storage (sub-millisecond) compared to hard disks (5-15 milliseconds depending on class). But there is little assurance that low latency is maintained under high queue depth operation. In the application silo, too many (in management) are enamored of new technology over old discipline resulting in bloated data structures of questionable normalization.

If we have well configured infrastructure with correctly architected data structures, then any Insert, Update and Delete performance anomalies may be mild, and we do not need to learn anymore on this topic. If not, then it might be worthwhile to consider tricks to work around IUD issues. As stated above, the two main factors are 1) the estimated number of rows to be modified being below the threshold for asynchronous I/O and 2) data and index pages to be modified not being in buffer cache.

This means we have two avenues of attack. One, influence the estimated number of rows to be modified so that it is above the threshold if this is what is desired. Two, get the pages to be modified into buffer cache before the write query.

Test Tables, Methods and Set Points

Methods from previous articles are retained. The tables have a clustered index that is unique on the single column ID. There are two nonclustered indexes, one on column C and the other on D. Each has an Included column. There may also be a foreign key column F. Table T has 24 rows per unique value of C and 25 rows per value of D. Table T1 has 249 rows per C value and 250 rows per D value.

Initial conditions for testing are as follows. A CHECKPOINT is issued to write dirty pages to disk. DBCC DROPCLEANBUFFERS then removes clean pages from the buffer cache. A series of queries to other tables are run to fill the buffer cache to or close to the target memory. The reason for this is that SQL Server has a different strategy for read I/O from a cold cache to bring the system more quickly to a warm cache state.

The SQL Update statement from previous articles is below.

UPDATE?dbo.T?SET?CreateD?=?GETDATE()?WHERE?C?=?301

The update statement modifies a column that happens to be in the Include list of the nonclustered indexes. Changes to the underlying table and each nonclustered index are required.

No pages from the table are in memory on the first execution. There will be I/O access to IAM, index root, intermediate and leaf level pages are required. When the query is executed again with a different parameter value, I/O is required mostly for the leaf level and possibly the first intermediate level above the leaf level. In other words, it is easier to correlate the physical I/O to the number of rows modified.

In?IUD Performance Anomalies?, I found examples showing asynchronous I/O access to the main table (clustered index) occurs at Estimated Number of Rows 25 and higher (See comment further on in this article). This is the same set point as for Key Lookup in Select queries. Asynchronous I/O for access to nonclustered indexes seem to occur at Estimated Number of Rows 250 or higher. In both cases, this would be evident from SET STATISTICS IO reporting read-ahead reads, even if this is technically random prefetching?

Estimated Number of Rows Approach

The first method of addressing IUD performance anomalies revolves around the estimated number of rows at the write operation. The number of rows in steps prior to the write operation is not the topic of discussion in this article.

We should make clear first that the purpose of data distribution statistics is to support row estimation. Under certain circumstances, for certain key values, it is possible to have an exact distribution value. But whether a given key value is exact or estimate is not the point.

In fact, we are not entirely interested in whether the estimate is accurate. Our interest is in whether the estimate is row estimate is above or below the asynchronous I/O set points and whether we think I/O, if required, should be synchronous or asynchronous. Furthermore, if we want or expect plan reuse, then there is the matter of plan compile parameter values versus different actual execution parameter values.

A query can have search criteria that is simply amenable to reasonable estimation. Row variation might also occur through propagation through multiple operations.

Below is the parameterized version of our reference Update query.

DECLARE?@P1?bigint

UPDATE?dbo.T?SET?CreateD?=?GETDATE()?WHERE?C?=?@P1

Suppose this is part of a stored procedure, and @P1 is a parameter of the procedure. On first execution, the value of @P1 is used for row estimation, and that estimate is used in the plan for further execution until force out or a recompile is triggered.

We can use the OPTION OPTIMIZE FOR hint to override the compile, example below.

DECLARE?@P1?bigint

UPDATE?dbo.T?SET?CreateD?=?GETDATE()?WHERE?C?=?@P1

OPTION(OPTIMIZE?FOR?(@P1=value))

We would chose the key value to OPTIMIZE FOR having either below 25 (or 26) rows if we do not want read-ahead at all, between 25 to 249 if we want read-ahead only the clustered index but not the nonclustered indexes, and above 250 if we want read-ahead on all.

The difficulty in this approach is that we do not control the statistics range high key values, and these may change. To an extent, there is a degree of repeatability due to the?Statistics?computation method.

It is necessary to find key values that have desired distribution and to confirm that the distribution does not change when statistics are update. This can make it an impractical approach for production environments with inflexible source control policies. It is also necessary to verify that the Option Optimize For has the expect outcome, including a check of which statistics were employed by the compiled plan.

Another option is graft statistics from another index having the same keys, including data type, explicit and implicit having desired distribution values. This can be done with the DBCC SHOW_STATISTICS with STATS_STREAM option from the source and UPDATE STATISTICS STATS_STREAM equals on the destination.

There is a third way to control statistics involving the use of a temporary table. In this, we are close to the second method of controlling I/O and will be discussed in the next section.

Preloading Pages into Buffer Cache

The second method of avoiding write performance anomalies to avoid read I/O in the Insert, Update or Delete statement itself. This means running a Select query to touch pages that need to be written to in the IUD query. Executing the search (filter) portion of the query is one. The second part is to touch the main table and any nonclustered indexes that need to be modified but were not involved in the search.

We might ask why introducing otherwise unnecessary queries that do nothing aside from reading pages into the buffer cache would help. The reason is that the write query holds exclusive locks on rows being modified while the query is in progress (note the new?Optimized locking?feature of Azure SQL, not in on-prem versions yet?).

If the write query is active for a protracted period, then the exclusive lock can block other queries resulting in downstream problems. In transferring the I/O to a Select query, shared locks are employed and only while the row is being accessed? Furthermore it is our goal to ensure that I/O generated by the read statement is asynchronous if appropriate.

Here it helps that Select queries has only the 25 (or 26) row set point for asynchronous I/O versus the write operator with a 25 row set point for the clustered index and 250 row set point for nonclustered indexes.

It is possible to preload necessary clustered and nonclustered index pages without a temp table. The choice of whether to do so depends on the complexity and efficiency of the filter criteria. The example given here employs a temp table for brevity.

The temporary table below is designed to contain necessary information to support the preloading queries. The main table clustered index key column(s) is present along with necessary nonclustered index key foreign key columns.

CREATE TABLE?#x(ID?bigint?NOT NULL,?C?bigint,?D?bigint,?F?bigint?)

For Insert and Delete, all nonclustered indexes need to be accessed except filtered indexes under certain circumstances. For update, only the nonclustered indexes in which either key or included columns that are modified need to be handled.

Foreign key columns should be populated for Insert statements, but only if modified for Update statements. Delete does not need to consider foreign key columns, but any table that foreign keys to this table should have an index.

Two methods of populating the temporary table are described here. One is a multiple step approach. The cluster key column is inserted in the first step. The other columns are populated with a subsequent update. The second is a single step approach in which all columns are inserted using the query with the filter criteria.

Temp Table Populated in Multiple Steps

The two-step approach begins with populating only the clustered index key using the original write statement filter criteria. Example below.

INSERT?#x(ID)?SELECT?ID?FROM?dbo.T?WHERE?C?=?@P1

The execution plan for the above is a nonclustered index seek for a range of rows. This plan is expected to generate read-ahead reads for any additional pages beyond the first leaf level page.

No alt text provided for this image

The second step is to update the temporary table to populate the other columns.

UPDATE?x?SET?C?=?T.C,?D?=?T.D,?F?=?T.F

FROM?#x x?JOIN?dbo.T?ON?T.ID?=?x.ID

The execution plan for the Update should be a Scan of #x followed by a (Nested Loops) Join to T1 with a clustered index seek unless the row count is excessive or there is a peculiar nonclustered index having the same key as the clustered index.

No alt text provided for this image

If there is the peculiar nonclustered index, then consider a query index hint for the clustered index.

The filter criterion in this example is simple and the two above steps can easily be consolidated. One reason for dividing the process into two steps is for situations in which the filter criteria is sufficiently complicated that the query optimizer cannot make an accurate row estimate.

This can occur when there are two or more inequality conditions or there is no index matching all the equality conditions. It is not uncommon for complex logic in the filter criteria to result in poor accuracy for the estimated number of rows.

By only outputting the clustered index key column, implicitly part of every nonclustered index, the first Insert/Select plan should only have a nonclustered index access and not a Key Lookup operation.

If six or more rows are inserted to the temp table, the next query to use #x will trigger statistics, and in turn a recompile. The statistics and recompile ensures the second statement updating the other columns of #x are based on actual rows and not an estimate.

If we do not trust the SQL Server automatic statistics recompute set points, then we could have an explicit Create Statistics or Create Index statement on the temp table after the Insert and before the Update.

Transplanted Statistics

If for some reason, we are unhappy with actual statistics on the temp table #x, we can transplant statistics. Outside of the procedure with the write statement, we create the temp table #x, populate the table with the desired number of rows.

Create a clustered index on the ID column,

CREATE UNIQUE CLUSTERED INDEX?UCX?ON?#x(ID)

Run DBCC SHOW_STATISTICS as below.

DBCC SHOW_STATISTICS('tempdb..#x',UCX)?WITH STATS_STREAM

The output for our example is shown below.

No alt text provided for this image

Within the procedure having our write statement, create the temp table, populate the cluster key columns (just ID in this example), and create the same unique clustered index as above.

Update the statistics on the index as follows using the output of the DBCC SHOW_STATISTICS on the reference table.

UPDATE?STATISTICS?#x(UCX)?WITH STATS_STREAM?=?0x01000000010000000000000000000000...

,?ROWCOUNT?=?50?,?PAGECOUNT?=?1

Comment: In?Asynchronous IO and Storage Arrays, I showed that read-ahead (random prefetch) activates for Key Lookup at Estimated Number of Executes 25. It would seem that this occurs at Estimated Number of Executes 26 for a (nested loops) join, even though both operations employ Nested Loops. The main point is to be aware that there may be more unknowns in the actual code path than demonstrated in the small number of examples given in this series of articles.

Below left is the Scan details with 24 rows in the table based on actual statistics.

No alt text provided for this image

Below right is the detail with the Update Statistics setting false values.

No alt text provided for this image

After applying the transplanted statistics, we can update #x setting the unpopulated columns with a join to the main table on the clustered index key. Depending on the row count of our transplanted statistics, the Update query should generate asynchronous or synchronous I/O as desired based on our transplant statistics.

Given that we have just modified rows of the temp table, the next query using the #x may trigger a statistics recompute. If we desired to continue using the transplanted data distribution statistics, I believe the procedure should be as follows.

First update the statistics explicitly the normal way. This should reset the rows modification counter back to zero, indicating no update is necessary.

UPDATE?STATISTICS?#x(UCX)

The update with the transplanted statistics as earlier. I have not run this recently, and some investigation is warranted.

Temp Table Populated in Single Steps

If we are not concerned by row estimate uncertainties in the filter criteria, then we can populate the temp table #x in a single step.

Insert rows to be modified into #x.

INSERT?#x(ID,?C,?D,?F)

SELECT?ID,?C,?D,?F?FROM?dbo.T?WHERE?C?=?@P1

The plan is shown below.

No alt text provided for this image

The Index Seek (NonClustered) implements the filter criteria. The Key Lookup to the Clustered index gets the columns needed that are not in the nonclustered index.

If the nonclustered index has all the columns that we need to obtain, then there is no key lookup. This may be great for Select query efficiency (at the expense of higher Insert and Update overhead). But here our goal is to load the main table (clustered index) pages.

In this case, the two-step approach is appropriate. A nonclustered index is used to evaluate the filter criteria. The second step should access the clustered index as the join criteria is the clustered index key.

Loading Index Pages to Buffer Cache

With the write row set populated in the temp table #x, we can now access the nonclustered index pages that will be written to in the write statement. The access has the effect of loading the page into buffer cache.

The example below has an index hint to force the index.

DECLARE?@p1?bigint

SELECT?@p1?=?COUNT(*)

FROM?#x x?JOIN?dbo.T?WITH(INDEX(IXD))?ON?T.D?=?x.D?AND?T.ID?=?x.ID

Notice the join condition specifies both the nonclustered index key and the clustered index key. This ensures a one-to-one join if the clustered index key is unique, avoiding a many-to-many join. The index hint ensures the use of the intended nonclustered index instead of the clustered index.

Each index that requires modification in the write statement should be touched.

Loading Foreign Key Parent Pages to Buffer Cache

The method to touch pages for the foreign key parent table is done in a similar manner, with some changes. In our example, the foreign key column is F, and the parent table is F with key column also F.

SELECT?@P1?=?COUNT(*)

FROM?(SELECT DISTINCT?F?FROM?#x)?x

JOIN?dbo.F f?ON?f.F?=?x.F

The Write Statement

The final task is to accomplish the goal of the original write statement having most if not all pages to be written in buffer cache. We can just execute the original statement. If there are complexities in the original statement, it can be more efficient to use our temp table to identify the rows as below.

UPDATE?T?SET?...?FROM?#x x?JOIN?dbo.T?ON?T.ID?=?x.ID

WHERE?C?=?@P1

If we are paranoid about whether the previously identified rows still meet the filter criteria, we can also include it as well.

Other Use Case

There is another use case for the multi-step method with the temp table, though not in the scope of the main topic. It is mentioned because the methods are largely the same.

Consider an Update statement in which the filter criteria is somewhat complicated. Many rows must be examined, but few rows actually meet the complete criteria. Suppose one of the criteria is an equality condition that matches a nonclustered index key. The execution plan is an index seek on the nonclustered index but still must evaluate many rows.

Instead of executing the Update statement directly, Select the clustered index key into a temp table. The same number of rows are evaluated, but only a shared lock is taken on each row. Use the temp table to join back to the main table for the Update. Exclusive locks are only taken on the rows actually updated.

Summary

The rather complicated set of steps here must be crafted to the specific clustered and nonclustered indexes, and foreign keys of the target table. It should only be necessary for writes of many rows under specific circumstances of the estimated number of rows being below the asynchronous I/O threshold, yet actual number of rows sufficiently high for this to be a problem and the pages to be written not being in buffer cache.

This problem used to be more severe in the hard disk storage era when the best storage systems at low queue depth per HDD could support 5 ms access latency. An overloaded system could be operating at over 20ms latency. In the SSD era, 100 μs latency is possible in enterprise devices operating within an appropriate queue depth. However 1ms and higher latency could still happen. This is more likely when poor network bandwidth is a factor.

The situation to watch for is estimated rows being low but actual rows very high. This is most likely to happen when the filter criteria has complexity that prevents the query optimizer from making a good estimate.

References

Insert, Update and Delete Plan Cost,

Asynchronous IO and Storage Arrays,??Plan Costs.

Microsoft Learn:??Reading Pages,???Writing Pages,??Statistics.

Posts by?Craig Freedman:??Sequential Read Ahead,???Random Prefetching

Praveen Madupu

Sr SQL Server DBA

1 年

Thank you

回复

要查看或添加评论,请登录

Joe Chang的更多文章

  • US Population from Neilsberg vs SSA (Elon)

    US Population from Neilsberg vs SSA (Elon)

    update : data from census.gov, not sure if actual or estimated, additional bracket for 80-89, 90-99 and 100+.

    1 条评论
  • single to multi-socket scaling

    single to multi-socket scaling

    Lenovo and AMD recently published TPC-E benchmark result for the 2-socket EPYC 9554. Most recent AMD TPC-E results have…

    5 条评论
  • SQL Server Performance from Intel Comet Lake to Raptor Lake

    SQL Server Performance from Intel Comet Lake to Raptor Lake

    A year ago, I reported on the performance characteristics of basic SQL Server operations (L2 Cache Size…

  • Filtered Statistics Tricks in SQL Server

    Filtered Statistics Tricks in SQL Server

    Data distribution statistics is one the foundational elements of cost-based query optimization in modern relational…

    2 条评论
  • L2 Cache Size & SQL Performance

    L2 Cache Size & SQL Performance

    L2 Cache Size Impact on SQL Server Performance Edit 2023-04, Update for Raptor Lake L2 Cache Size Impact on SQL Server…

  • SQL Server Joins - 2 SARGs

    SQL Server Joins - 2 SARGs

    SQL Server Join Costs, 2 SARG The previous Join costs covered join with a search argument on one source only at DOP 1…

    2 条评论
  • SQL Server Parallel Plan Cost

    SQL Server Parallel Plan Cost

    SQL Server Parallel Join Costs Up until the early 2000's, microprocessor vendors focused on improving the performance…

  • SQL Server Join Costs

    SQL Server Join Costs

    SQL Server Join Costs, 1 SARG Here we look at Loop, Hash and Merge joins with an equality value search argument (SARG)…

    2 条评论
  • SQL Server Key Lookup & Scan

    SQL Server Key Lookup & Scan

    SQL Server Key Lookup & Scan, Plan vs. Actual Costs Previously in Plan Cost Cross-Over, we looked at the SQL Server…

  • SQL Server Plan Cost Cross-Over

    SQL Server Plan Cost Cross-Over

    SQL Server Plan Cost Cross-Over Modern databases employ cost-based optimization (CBO). One element of CBO is a model…

社区洞察

其他会员也浏览了