SQL Server Statistics
Mohammed Arshad
UAE ???? Govt Emp - 9 yrs |HealthcareTech,FinTech,Hospitality | Dy Manager- Database Administration Top Voice | Applications’ Architect | Lead DBA -MS SQL Server/NoSQL(MongoDB)/PostgreSQL/Oracle/Sybase | Data Analytics
SQL Server statistics , or statistics in general seems to be an often misunderstood or confusing topic for many database professionals, especially at junior to mid-level.??Hence friends, I wanted my first?topic to be on SQL server statistics; what they are and how they influence the query performance. Let's start with the very basics.
??????????????????????????????Statistics in SQL server are the metadata (data about the data) .It’s that data, looking at which provides an idea about what kind of data (data skewed)?is available in a table , the density / distribution of the data in conjunction with the distinct values in a column , and the row?count of the table?etc. SQL Server Query Optimizer when preparing a query plan , needs to do an estimate of the cost that is estimated to be incurred to execute that query . To do this, the Optimizer has to refer to the statistics of that particular table(s) in the query which help it to do that estimate and hence helping to go with the least costly plan.
Objects that provide Statistics information are Stats Header , Density Vector and Histograms.?Though I'll explain about them in upcoming posts, I'll use a very generic depiction below for the sake of understanding.
As an analogy, consider the example of a GPS system. If the data in the GPS device is not updated and not in sync with that at the transport department, ?a traveller will find it difficult to trace a destination point and most probably , take a longer route (requiring unnecessary additional resources) while the same destination could’ve been reached at lesser cost because there was a new bridge constructed recently by the transport department , the information of which is not available in the current version of the GPS data(which is stale!).
To help understand in more technical terms consider a user table?in a?Telecom company database which has column Plan_Desc column that stores the description of an internet mobile data package plan customers can subscribe to.
Some of the examples of the sample of rows can be as below:
1.Monthly_Contracted_With-10-GB-All-Applications (100 K rows)
2.Term-Span-Monthly (50 K rows)
3.1-GB-Per-Day-non-renewable-Non-convertible-Non-Refundable (100 k rows)
领英推荐
4.1-GB-Per-Day-National_Pack(500 k rows)
5.1-GB-Per-Day-non-renewable-Non-convertible-Non-Refundable (150 k rows)
6.Staff-Pack_All (25 k rows)
The statistics for this column store the distribution /skewing of the data , meaning, the variations in the contents (Around 50 characters for a set of 100 k rows ,for the item number 3 above or?14 characters for a set of?25 k rows for item 6 etc.). This distribution / density of the data and the current (as per the statistics data which might be very different from the current actual) total row count of the table has a direct bearing on the selection of an optimized query plan by the Query Optimizer.
Importance of Statistics:?As discussed above , Statistics are a very crucial component in having a?direct impact on the performance of any piece of code?, a stored procedure?, function or an Ad-hoc T-SQL query batch.?The query optimizer , when working for preparing a best execution plan for a query relies heavily on two things?-?Indexes and SQL Server Statistics . The statistics data provides the optimizer the information required to 'estimate' what might be the best way to execute the query at minimal cost (which is called Cost Based Optimization).?Hence its apparent that if the statistics are not updated or they become stale , the optimizer will get mislead by the stale stats and chooses a longer path (hence costlier)?to process the query which results in poor query performance . If this query is a long running and holds -up resources for long, it?can have a cascading effect on other queries running on this database, and can affect the overall performance of the system. Hence care has to be taken to make sure that the statistics are maintained properly by the DBA.
I hope I've been able to make the basic understanding of the SQL Statistics easier for you. I'll continue with the?understanding of Histograms and other details about statistics and how to manage them, in my forthcoming blogs shortly. Please do visit again.
Ref :- sqlempire.wordpress.com
Any comments from you?would be helpful and encourage me in making it better.?Happy Reading ! Good Bye for now !