Row-based vs. Column-based Indexes

Row-based vs. Column-based Indexes

Indexing is a huge topic and there are many aspects to it, internals, data access methods, features, rules, restrictions, and of course, performance tuning for your queries. To do a good job tuning SQL Server I feel that you need to understand all of these facets of indexing because there's both a science to indexing as well as an art. One side of indexing for performance is query tuning. This is always the first step for improving the performance of a query. You analyze what the query is trying to do and you come up with the best index for that query but what's best for the query might not be best for the server, so after query tuning comes server tuning and even server tuning has multiple facets to it.

In this article we're going to cover row-based and column-based indexes, and where and when you should consider using these different indexes.

Table Structures

In SQL Server when you create a table, that table's data needs to be stored somewhere. That structure we tend to call your base-table structure. And in SQL Server the different table structures that are possible are a heap, so you can have a table that has no clustered index and you have a collection of unordered pages in which that data is stored. Now you could instead create a clustered index on that table. But depending on version, you could go with a row-based clustered index, and we've had that ability in all versions, so that's nothing new. But as of SQL 2014, you can create a column-based clustered index. They did introduce column-based indexes in 2012, but in 2012 there were only non-clustered index structures. In 2014 they could be a clustered structure.

Indexes structures

So in terms of the structures themselves, row-based indexes in SQL Server have two primary components to an index structure: - Leaf (bottom) level: contains something for every table row in indexed order. - Non-leaf level(s) or B-tree: contains something, specifically representing the FIRST value, from every page of the level below.

What's actually happening with a columnstore index or column-based indexes is: -  All rows for a single column are stored together –potentially HIGHLY compressible. - - Data is segmented into groups of 1 million rows for better batch processing. - SQL Server can do “segment elimination” (similar to partition elimination) and further reduce the number of segments to process. - Parallelization through batch mode.

Choice Depends on Workload

So which index you choose is going to be heavily dependent on the workload that you have.

  • OLTP:

So if you have a predominantly OLTP workload, your priority is really towards data modifications, inserts, updates, deletes, and often very highly selective point queries. What did this customer buy with this order? What are the orders that this customer purchased? You're looking for specific and generally small sets of data. The best type of indexes is usually those that can help you find a small set of rows fairly quickly. And usually, that means that you have a clustered index that is row-based for your table structure, and then you use secondary row-based nonclustered indexes to help you search and find those individual rows. Now you might consider a nonclustered columnstore in an OLTP environment but you're going to have to really want to choose these wisely, and so in some cases where you might have large datasets that are partitioned, even in an OLTP environment, then you might use the columnstore indexes on some of the older data where you're doing analysis as opposed to the new data where you really want to focus on performance for OLTP.

  • DSS/RDW:

Now if your workload is decision support system or relational data warehousing, then you're probably doing things like sum of sales for all of your customers and all of your sales, or sum of your sales by customer, sum of your sales by product. But you're really looking at a large amount of your data. And in this case, a clustered columnstore index could be a much better choice but you're going to find out that that's limited to only SQL Server 2014 and higher. And that's why I have a section where I'm going to talk specifically about the different versions and what you would consider in each version. Now secondary indexes are still important. And based on which version you're using, you might actually find that you can't even have secondary indexes. So this is why it's going to be important to tie this in with version. But usually in a data warehouse, you might want to get sums and calculations, but then you do want to dig deeper in some cases, so you'll possibly want both types of indexes. So point queries, if you're going to do them, would benefit from row-based nonclustereds.

RDW/Columnstore Indexes by SQL Server Version

  • SQL server 2008:

o  Lowest version to consider for large table/data analysis with performance and scalability features.

o  Added data compression (row and page compression).

o  Added filtered indexes/filtered statistics.

o  Fixed fast-switching for partition-aligned, indexed views.

  • SQL Server 2012

o  Added read-only, nonclustered columnstore indexes.

o  Some frustrating limitations but still amazing performance when possible and/or workarounds used (details at SQL Server Columnstore Performance Tuning)

  • SQL Server 2014

o  Added updateable, clustered columnstore indexes.

o  The frustrating limitation that no other indexes allowed.

o  Many other frustrating limitations with columnstore fixed e.g., UNION ALL supports batch mode, so usable with partitioned views

o  Added “incremental statistics” to help limit rebuilds as well as time to rebuild.

  • SQL Server 2016

o  Added updateable nonclustered columnstore indexes.

o  Added row-based nonclustered indexes with clustered columnstore indexes.

o  Any combination of base-table structure with nonclustered indexes possible.

  • SQL Server 2017

o  Supports non-persisted computed columns in clustered columnstore indexes.

Indexes problems

  • Row-based Problems

More tuning work for analysis: must create appropriate indexes per query and then consolidate

Stores multiple columns of data together (not as easily compressed)

  • Column-based Problems

Minimum set for reads is a row group (no seeks possible)

Limitations with some features (fewer and fewer by SQL Server version)

Finally Row-based vs. Column-based Indexes

  • Row-based 

Support data compression

Can support point queries (seeks)

Wide variety of supported scans

Full and/or partial table scans

Nonclustered covering scans

Nonclustered covering seeks with partial scans

  • Column-based

Columnar data stored together often allows much higher level of compression

Supports large-scale aggregations

Support partial scans with segment elimination, and combine with partitioning for further elimination

Summary:

We covered is row-based index structures, column-based index structures, how to know what's appropriate based on the workload that you're running. We compared and contrast what the options were, what the benefits were, as well as what the problems were of row-based and column-based indexes. We looked at columnstore index options by version with really our primary emphasis on 2016 and higher if you want to get the best benefit of both columnstore and row-based indexes. I hope this article has been informative for you.

Useful Links:

1-   Columnstore indexes – overview

2-   Columnstore indexes - what's new?

3-   Columnstore indexes - architecture


要查看或添加评论,请登录

Ayman Elnory的更多文章

  • Mastering Technical Leadership: Insights from Leading a Major IT Migration

    Mastering Technical Leadership: Insights from Leading a Major IT Migration

    Ever faced a project that seemed impossible? That’s exactly how it felt when we began migrating one of the most…

    4 条评论
  • Database Corruption - Real world Issue

    Database Corruption - Real world Issue

    Corruption does happen, many times per day, all around the world Many people don’t realize they have corruption until…

  • Misconceptions

    Misconceptions

    1- Transaction Log Corruption and Backup in SQL Server 2- Tip: Know How and When to Use Emergency Mode Repair

  • Why Good Employees Leave?

    Why Good Employees Leave?

    If you have a turnover problem,look first to your managers and supervisors.

  • Peter Drucker Quotes

    Peter Drucker Quotes

    Management is doing things right; leadership is doing the right things. The most important thing in communication is to…

    1 条评论

社区洞察

其他会员也浏览了