SQL Server Indexing: 5 Steps to Better Performance

SQL Server Indexing: 5 Steps to Better Performance

Customers don’t want to wait more than a few seconds for an application to process and move to the next step. And we keep getting squeezed on our batch window to do all our daily processing. We cannot just throw more hardware at the problem. It may be easy in the cloud. But it comes at a higher cost.

Indexing is a key factor of performance. Database design and good code are also major components. Indexing is part art and part science. Here we will look at the steps to keep your indexes up to date.

Get your candidate list.

SQL Server gives you a list of what it thinks would help your queries. This is a good starting point. However, sometimes it is overkill or redundant.

Below is the query I use. I set some variables that help filter the list and I updated the index name to be more standardized. In large databases, you may get hundreds of recommendations. Increase the MinDBImpact and MinUserImpact variables until your list is manageable. You want to implement in small batches to monitor performance.

DECLARE @MinDBImpact int = 100,     -- > 10k high impact, look at > 1k if needed  
        @DBName nvarchar(50) = null,  -- Filter by DB   
        @MinRowCount int = 100,     -- remove small tables from analysis  
        @MinUserImpact int = 20       -- remove low impact indexes
SELECT Substring(statement, Charindex('.', statement) + 1, 99)        AS        TableName,   
       rc.[rowcount]                                                  AS        Row_Count,   
       CONVERT (DECIMAL (28, 1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans )) AS estimated_improvement,   
       'CREATE INDEX IX_'   
       + Replace(Replace(Replace(Substring(statement, Charindex('.', statement, Charindex('.', statement)+1)+1, 99), '[', ''), ']', ''), ' ', '_')   
       + '_'   
       + Replace(Replace(Replace(Replace(Isnull(mid.equality_columns, inequality_columns), '[', ''), ']', ''), ' ', '_'), ',', '')   
       + ' ON '   
       + Substring(statement, Charindex('.', statement)+1, 99) + ' (' + Isnull (mid.equality_columns, '')   
       + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL   
              THEN ',' ELSE '' END   
       + Isnull (mid.inequality_columns, '') + ')'   
       + Isnull (' INCLUDE (' + mid.included_columns + ')', '')       AS        create_index_statement,   
       migs.last_user_seek,  
       migs.avg_total_user_cost,  
       migs.avg_user_impact  
FROM   sys.dm_db_missing_index_groups mig   
       INNER JOIN sys.dm_db_missing_index_group_stats migs   
               ON migs.group_handle = mig.index_group_handle   
       INNER JOIN sys.dm_db_missing_index_details mid   
               ON mig.index_handle = mid.index_handle   
       JOIN (SELECT Quotename(Schema_name(sOBJ.schema_id))   
                    + '.' + Quotename(sOBJ.NAME) AS [TableName],   
                    Sum(sdmvPTNS.row_count)      AS [RowCount]   
             FROM   sys.objects AS sOBJ   
                    INNER JOIN sys.dm_db_partition_stats AS sdmvPTNS   
                            ON sOBJ.object_id = sdmvPTNS.object_id   
             WHERE  sOBJ.type = 'U'   
                    AND sOBJ.is_ms_shipped = 0x0   
                    AND sdmvPTNS.index_id < 2   
             GROUP  BY sOBJ.schema_id,   
                       sOBJ.NAME) rc   
         ON rc.tablename = Substring(statement, Charindex('.', statement) + 1, 99)   
WHERE  CONVERT (DECIMAL (28, 1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans )) > @MinDBImpact  
       AND statement LIKE concat('%',@DBName, '%') -- filter by DB   
       AND avg_user_impact > @MinUserImpact -- filter by User Impact    
       AND rc.[rowcount] > @MinRowCount   
ORDER  BY Substring(statement, Charindex('.', statement) + 1, 99),   
          migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans ) DESC

For this demo, I installed Adventure Works 2016 Extended. It is bigger than other sample databases. I deleted nonclustered indexes, created a large table, and ran stored procedures and views. This generated missing index information we will look at. Since this is still a small database compared with production databases, I lowered the impact threshold to see more examples. I have seen production databases with hundreds of missing indexes.

The right-most column shows the estimated impact percentage. Use this and the estimated improvement column to tackle them in order of benefit.

Validate the recommendation

Example 1

Line 6 is against Sales Order header with over 92% impact. Let’s analyze this.

CREATE INDEX IX_SalesOrderHeader_SalesPersonID ON [Sales].[SalesOrderHeader] ([SalesPersonID]) INCLUDE ([OrderDate], [SubTotal])

Index on SalesPersonID sounds reasonable. What about the include columns? This is where the art part of tuning comes in. Where and how is this used?

Search for where it is used:

SELECT [Scehma] = schema_name(o.schema_id), o.Name, o.type   
FROM sys.sql_modules m  
INNER JOIN sys.objects o ON o.object_id = m.object_id  
WHERE m.definition like '%salesOrderHeader%'

We get this list:

In the view, vSalesPersonSalesByFiscalYears, we see joins by SalesPersonID and a sub query that uses SubTotal and OrderDate.

Seems like a good recommendation so far. Let’s check the query plan.

Here it gives us the missing index recommendation again. And we can see a scan that takes 66% of the query.

It is used in a view, would eliminate a scan, and seems like information we would use regularly. I would accept this and create the index.

After we create the index and look at the query plan again. We see the scan turned into a seek and is now only 17% of the Query.

Example 2

Line 7 index is also against Sales Order Header and has a long list of included columns.

CREATE INDEX IX_SalesOrderHeader_PurchaseOrderNumber ON [Sales].[SalesOrderHeader] ([PurchaseOrderNumber]) INCLUDE ([RevisionNumber], [OrderDate], [DueDate], [ShipDate], [Status], [OnlineOrderFlag], [SalesOrderNumber], [AccountNumber], [CustomerID], [SalesPersonID], [TerritoryID], [BillToAddressID], [ShipToAddressID], [ShipMethodID], [CreditCardID], [CreditCardApprovalCode], [CurrencyRateID], [SubTotal], [TaxAmt], [Freight], [TotalDue], [Comment], [rowguid], [ModifiedDate])

Any index recommendation with a long list of include columns should be a red flag. It could be from an ad hoc query or just overkill. Scanning the usage of PurchaseOrderNumber, we find that it is used as a parm and insert in a few stored procedures. It also only had an impact of 34%. I would skip this index for now. If customer lookups started searching by PO, then we may need the index.

Example 3

Overlapping indexes incur more maintenance overhead without any performance benefit. Knowing your data and how it is used will help in this scenario. As a database ages, similar or duplicate indexes will accumulate. This is caused by hot fixes, poor code reviews, autotuning, and lack of periodic review.

The more indexes you have, the more overhead in updating them. Also, the more include columns you have the bigger the index gets. This will reduce the performance. And you will get diminishing returns. I saw a table with 95 indexes. It was in a data warehouse, but it was indexed like it was a transactional system. Know your use cases.

Combine overlapping indexes.

CREATE  NONCLUSTERED INDEX IX_Address_1 ON Person.Address  
(AddressLine1)
CREATE  NONCLUSTERED INDEX IX_Address_2 ON Person.Address  
(AddressLine1, AddressLine2, City, StateProvince)
CREATE  NONCLUSTERED INDEX IX_Address_3 ON Person.Address  
(AddressLine1, AddressLine2, City, StateProvince, PostalCode)

Only the last one is necessary.

Example 4

Include columns should be kept to a minimum. Use cases include a frequent calculation like the first example, a foreign key that would allow you to bypass pulling in the whole table, and occasionally search criteria (i.e., column in a where clause).

An example of the foreign key include can be found in Adventure Works DW. The vTargetMail view uses the CustomerKey to join to DimCustomer. The include column will be used to join to another table without reading the row/page of the base table. The query plan recommends the following index:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]  
ON [dbo].[FactInternetSales] ([ProductKey])  
INCLUDE ([CustomerKey])

Bonus Tip

Filtered indexes can greatly increase performance in specific cases. A prime example is filtering on the active flag in dimension tables in a data warehouse. When the active rows become a small percentage of the table, filtered indexes will dramatically increase performance. See: Create filtered indexes.

Example

CREATE NONCLUSTERED INDEX IX_DimCustomer_IsActive    
    ON dbo.DimCustomer (CustomerID)    
    WHERE IsActive = ‘Y’;

You may have to force statistic updates after a batch load.

Summary

  1. Get your candidate indexes

  2. Verify they are used regularly

  3. Combine duplicate or similar indexes

  4. Avoid too many include columns

  5. Iterate over your list in small batches and monitor performance

Let us know what tips or rules of thumb you follow in tuning with indexes.