Removing Duplicate Statistics

It’s a SQL Server best practice to have Auto Create Statistics enabled on your databases. When this is on, SQL Server will create statistics on single columns when it needs them. However if someone comes along later and adds an index, that auto-created statistic might not be needed any longer.

The Potential Problem

When an index is created, SQL creates a statistic on the columns in that index. If the first column in that index was a column that SQL had previously auto-created a statistic on, you’ve now got duplicate statistics on that column. Is this bad? Maybe.

A duplicate statistic can impact storage space and performance. Statistics are relatively small objects in a database – they contain a maximum of 200 rows in the histogram and a very small amount of data in the density vector and header. Therefore, the impact on storage space is very minimal, but definitely not non-existent.

The biggest impact of duplicate statistics would be manifested as decreased performance. We all know too many indexes can slow down table inserts, deletes, and updates because SQL has to update all those indexes each time data is modified. The same situation exists with statistics. The more there are, the more SQL has to modify as data in the table changes. It’s not quite as bad as indexes because the number of data modifications has to reach a certain threshold before statistics are updated, whereas for indexes every single data modification of a column used by an index requires an index update. Nevertheless, the potential exists for SQL to have to do more work than it needs to when you’ve got duplicated statistics.

But that’s not the only way performance can be affected. As Erin Stellato points out in the comments to her blog post, it’s possible that SQL Server will pick a column statistic over an index statistic or vice-versa when generating a query plan. If one of these stats is out of date and that is the one SQL picks, a less-than-optimal query plan might be generated.

So it may be worth checking your databases for duplicate statistics, especially if you have a highly active database that you have recently added indexes to.

“That’s Strange…”

Being the good DBA that I am, I decided to do this check on my SQL Servers. I found a handy script at The code looked ok, so I ran it against one of my smaller databases and removed the statistics it said were duplicates. I re-ran the query and, sure enough, it returned no rows, meaning it found no more duplicates. Then I saw something shiny and got distracted.

When my focus returned to SQL, I ran the query again – accidentally against the same database I ran it against earlier. Strangely, this time it reported it found some duplicate statistics. Hmm.

That shouldn’t have happened. If SQL auto-created statistics again, it meant that it needed them – which meant that the stats on the indexes I thought were duplicated were not actually duplicates.

I puzzled on this for a bit and got sidetracked by the strange way SSMS displays statistics columns on the Property page. Then it got to be the end of the day and I went home. The next day, I had a comment on my previous post from Aaron Bertrand who mentioned there is a related bug with the stats_column_id column of the sys.stats_columns view. It does not contain what the MSDN documentation says it contains. The Connect item for this, along with a workaround, can be found here.

The script I was using did not reference that column, but it did get me thinking that perhaps the script was not correctly identifying the first column in an index.

What The Script Does

In looking at the script, I could see it looks for any statistics created for indexes. Through a join to the sys.stats table, the script selects any auto-created statistics on the same column that is the first column in an index. The key line is in the WHERE clause of the CTE:

and ic.index_column_id = 1

You would think that would select the first column in the index. That’s even what you might expect from reading the MSDN page for the view. Except that’s not always the case. If you want to find the first column in the index, you have to select the record where key_ordinal = 1, not index_column_id =1.

A Short Example To Illustrate This

Let’s go to a database we can mess around in and see if we can prove this. First, let’s create a table and a clustered index. (I chose this structure because it is the same as the table that caused me to see this problem in the first place.)


CREATE TABLE [dbo].[IndexStatTest]
      [ID] [INT] IDENTITY(1, 1)
                 NOT NULL ,
      [TypeLID] [INT] NOT NULL ,
      [Hidden] [TINYINT] NOT NULL ,
      [ObjectID] [INT] NULL ,
      [ObjectTypeLID] [INT] NULL ,
      [TMGUID] [CHAR](32) NOT NULL ,
      [LMG] [CHAR](32) NOT NULL ,
      [LMU] [NVARCHAR](100) NOT NULL ,
      [IsVisibleInCM] [TINYINT] NOT NULL ,
        ON [PRIMARY]


CREATE UNIQUE CLUSTERED INDEX [UK_IndexStatTest_02] ON [dbo].[IndexStatTest]
[ObjectID] ASC,
[ObjectTypeLID] ASC,

Clearly, we defined the index with ObjectID as the first column. Now, let’s run a query that joins the sys.index_columns view with sys.indexes and sys.objects (so we can see some names instead of just object IDs) and see what we get.


SELECT  ic.object_id ,
        ic.index_id ,
        ic.index_column_id ,
        ic.column_id ,
        ic.key_ordinal ,
        COL_NAME(o.object_id, ic.column_id) AS ColName , AS IndexName ,
        OBJECT_NAME(ic.object_id) AS TableName
FROM    sys.indexes i
        JOIN sys.objects o ON o.object_id = i.object_id
        JOIN sys.index_columns ic ON ic.index_id = i.index_id
                                     AND ic.object_id = i.object_id
WHERE = 'UK_IndexStatTest_02';

The results:


Recall the column order we defined for our index: ObjectID, ObjectTypeLID, and TypeLID. We can see that index_column_id does not number the columns in that order. It is the numbering in the key_ordinal column that indicates the order the columns are defined in the index.

This explains the behavior I was seeing. Because the script I used incorrectly identified the initial column the index statistic was created on, it incorrectly identified an auto-created statistic as a duplicate. Luckily, because I have auto create statistics enabled, I didn’t cause much chaos when I deleted the incorrect statistic – SQL Server just made a new one when it needed it.

One More Issue

In testing, I also ran across one more issue – namely auto-created statistics that have more than one column. I know this is supposedly not possible, but I discovered a weird case where it can happen. I wrote about it in some detail last week, so check that out if you are interested. Bottom line is this situation might occur if you are working with a database that was created in SQL 2000 (or possibly earlier).

What this meant for my script is that, due to the join on the sys.stats_columns table, some statistics might be incorrectly flagged as being duplicated when they really aren’t. Again, this would only happen in the strange case where you were working with an old database.

In deciding how to handle this, I came to the conclusion that I would just let those stats be selected for deletion anyway. Because auto-create statistics is enabled in the database, if any deleted stat is needed, SQL will go ahead and re-create it. (If auto-create statistics is not enabled in the database, there would be no reason to run this script in the first place.)

Nevertheless, if you are concerned about this, perhaps because you know performance in your environment is highly dependent on statistics and the overhead of auto-creating them in production could cause issues, look over the script output before running it. Any suspect statistics would be named _WA_Sys_<column name>_<hex ID of table> instead of the newer format _WA_Sys_<hex id of column>_<hex ID of table>. Don’t run the commands with the column name instead of the hex ID. Instead, run those commands during an outage window and then manually re-create the statistics, taking care to identify which column that stat is for and which ones were the included clustering keys. This will re-create those stats as the newer format single column stats.

The Corrected And Improved Script

I took Matt’s script as a starting point and I made the correction to the CTE. Because I wanted to run this script against a bunch of databases, I also modified it to cycle through all the databases on the SQL Server instance. I also changed it to output the commands to drop the duplicate statistics.

The script below will cycle through all user databases on a server, looking for auto-created statistics that are on a column that is also the first column in an index on the table. It ignores disabled and filtered indexes and will output a list of commands that drops the duplicate statistics. You can then take this output and run it against your server to remove these statistics. The script itself makes no changes to any database. If you want to see the details of which statistic is duplicated by which index, simply uncomment out the second to last SELECT statement in the script.

This script was tested against SQL 2012. I suspect it may not work with partitioned indexes – I did not test it against those.

Also note that this script only finds duplicate statistics between stats that were auto created and those from an index. It will not find duplicate between two column statistics. For example, if SQL auto created a stat on a column and someone later manually created one on the same column, this script would not find this.

This script has been tested against SQL Server 2008 R2, 2012, SQL 2014, and SQL 2016. It does NOT work on SQL 2005 and earlier.

	Script to find auto-created statistics that are no longer needed due
	to an index existing with the first column the same as the column
	the statistic is on. Disabled indexes are ignored. Based on script from

	The script will loop through all the user databases on a server and
	output a list of commands to drop the duplicate	statistics. For more
	details on the duplicates, uncomment the second to the last SELECT
	Disabled indexes are ignored.
	Filtered indexes are ignored.

	This script makes no changes to any databases.

	8-8-16	Initial release


DECLARE @LoopCounter INT;
DECLARE @WorkingDB sysname;

IF OBJECT_ID('tempdb..##UnneededStats') IS NOT NULL
    DROP TABLE ##UnneededStats;

CREATE TABLE ##UnneededStats
      [ID] INT ,
      [Schema] sysname NOT NULL ,
      [Table] sysname NOT NULL ,
      [Column] sysname NOT NULL ,
      [AutoCreatedStatistic] sysname NOT NULL ,
      [Index] sysname NOT NULL ,
      [DropStatCommand] VARCHAR(4000) NOT NULL

DECLARE @DatabasesTable TABLE ( DBName sysname );

INSERT  INTO @DatabasesTable
        SELECT  name
        FROM    sys.databases
        WHERE   source_database_id IS NULL	/* no database snapshots */
                AND is_read_only = 0		/* no read-only dbs (log shipping) */
                AND state = 0				/* online databases */
                AND name NOT IN ( 'master', 'model', 'msdb', 'tempdb' )		/* non-system databases only */
        ORDER BY name;

SELECT  @LoopCounter = COUNT(*)
FROM    @DatabasesTable;

WHILE @LoopCounter > 0

        SELECT TOP 1
                @WorkingDB = DBName
        FROM    @DatabasesTable;

        SET @SQLCmd = '
USE [' + @WorkingDB
            + '];
WITH    stats_on_indexes ( [object_id], [table_column_id], [index_name] )
          AS ( SELECT   o.[object_id] AS [object_id] ,
                        ic.[column_id] AS [table_column_id] ,
               FROM     sys.indexes i
                        JOIN sys.objects o ON i.[object_id] = o.[object_id]
                        JOIN sys.stats st ON i.[object_id] = st.[object_id]
                                             AND =
                        JOIN sys.index_columns ic ON i.index_id = ic.index_id
                                                     AND i.[object_id] = ic.[object_id]
               WHERE    o.is_ms_shipped = 0
                        AND i.has_filter = 0
                        AND ic.key_ordinal = 1
                        AND i.is_disabled = 0
    INSERT  INTO ##UnneededStats
            ( [ID] ,
              [Schema] ,
              [Table] ,
              [Column] ,
              [AutoCreatedStatistic] ,
              [Index] ,
            SELECT  o.[object_id] AS [ID] ,
           AS [Schema] ,
           AS [Table] ,
           AS [Column] ,
           AS [AutoCreatedStatistic] ,
                    stats_on_indexes.index_name AS [Index] ,
                    ''USE [' + @WorkingDB
            + ']; DROP STATISTICS ['' +
                    + ''].['' + + ''].['' + + '']'' AS DropStatCommand
            FROM    sys.stats s
                    JOIN sys.stats_columns sc ON s.stats_id = sc.stats_id
                                                 AND s.[object_id] = sc.[object_id]
                    JOIN sys.objects o ON sc.[object_id] = o.[object_id]
                    JOIN sys.schemas sch ON o.schema_id = sch.schema_id
                    JOIN sys.columns c ON sc.[object_id] = c.[object_id]
                                          AND sc.column_id = c.column_id
                    JOIN stats_on_indexes ON o.[object_id] = stats_on_indexes.[object_id]
                                             AND stats_on_indexes.table_column_id = c.column_id
            WHERE   s.auto_created = 1
                    AND s.has_filter = 0;';


        DELETE  FROM @DatabasesTable
        WHERE   DBName = @WorkingDB;

        SET @LoopCounter -= 1;


FROM    ##UnneededStats;

FROM    ##UnneededStats;

/* DISTINCT is needed because sometimes more than one index will start
	with the same column, resulting in two DROP statements being generated
	for the same statistic, which would cause an error when executed. */

DROP TABLE ##UnneededStats;


The Case Of The Multi-Column Auto-Created Statistic

I’ve been spending some time lately working with statistics, specifically auto-created statistics. When auto-create statistics is enabled, SQL Server will create column statistics when it needs them. No surprises there. Conventional wisdom also states that SQL Server never creates multi-column statistics, only single column statistics. In fact, this MSDN page says this of the auto create statistics option:

“It applies strictly to single-column statistics for the full table.”

Seems pretty clear. SQL Server will not automatically create multi-column statistics. I’ve heard this over and over from many different sources.

So imagine my surprise when, while testing some code, I came upon entries in the sys.stats_column system table that looked like this:


Wow.. That sure looks like three auto-created, multi-column statistics! We have three stats: stats_ids 3, 4, and 5. The sys.stats_column table contains one row for each column that is in a statistic, so multiple rows for a single statistic (i.e., a single stats_id value), indicate multiple columns in that stat. Indeed, the column_id values indicate the table columns contained the stat. So stats_id 3 contains columns VersionMajor and ApplicationID (column_ids 3 and 1), stats_id 4 contains columns VersionMinor and ApplicationID (column_ids 4 and 1), and stats_id 5 contains columns VersionRevision and ApplicationID (column_ids 5 and 1). And, clearly, the auto_created flag is true, so these three stats were auto-created. What’s going on?

I puzzled over this for quite some time got nowhere. I finally reached out to the SQL Server community for help. Erin Stellato (b | t) said she had never seen that before, but she did mention the naming of the statistics seemed strange. I noticed this as well. As Paul Randal has blogged, auto-created statistics are named as _WA_Sys_<hex value of column>_<hex value of table>. Typically, the names look something like _WA_Sys_0000000D_00000029.

But these stats don’t follow that format. Instead of the hex value for the column, they have the column name.

A couple hours later, Paul White (b | t) responded to my tweet:


Aha! It seemed highly likely that this was the cause of what I was seeing. Although I was working with a database on a SQL 2012 server, I was pretty sure that database was originally created on an older version of SQL. I wanted to confirm this however.

To verify what SQL version the database was created with, I used the DBCC DBINFO command. The value of the dbi_createVersion field indicates what SQL version was used to create the database.


Thanks to Paul Randal’s TechNet post, we know that a value of 539 means the database was created with SQL 2000. Looks like Paul White might be on to something.

The next piece I wanted to verify was that auto create stats for SQL 2000 included the table’s clustering key. Unfortunately, deep dive details for auto-create stats for SQL 2000 are somewhat hard to come by these days and I was not able to find anything that definitely stated the clustering key was included. However, in all the auto-created stats in all the old databases I found on my servers, this was the case. I also verified that if the clustering key was more than one column, all the clustering columns were included in the auto created stats. I’m going to go ahead and assume this is the way it worked back in the day.

And finally, I wanted to find something confirming the naming convention. Again, details were hard to find, but I did find this ebook which states that the names of the auto-created statistics in SQL 2000 use the column name, not the hex value of the column.

This is enough evidence for me, so I’m going to call it: I’m going to go ahead and say that auto created column statistics can be created on multiple columns. Granted, you have to be running SQL 2000 and the additional columns are always the clustering key columns, so it’s very specific case and it no longer happen in modern versions of SQL Server.

Still, if you are ever asked if SQL Server will automatically create multi-column statistics, you can now give the classic SQL Server answer: It depends :-)

If SQL doesn’t behave this way anymore, why is this important? This issue caught my eye when I was writing some code to examine the statistics tables and when I joined to the sys.stats_columns table, I was getting back more rows than I expected. Because I was joining to other tables as well, this messed up my final result set. If you’ve got databases that have been migrated from older versions of SQL, you might run into this problem as well. Come back next week for the story on that…


P.S. Just for kicks, I deleted one of the old, two column stats and ran a query so that a new stat would be created. Would the new statistic follow the old or new SQL Server rule? Turns out, not surprisingly, the newly auto-created stat followed the new rules: it was single column, did not include the clustering key, and followed the new naming convention using the hex value for the column name. This was tested using SQL 2012.

I also wanted to see what would happen if the database was running in SQL 2000 compatibility mode. On a SQL 2005 server, I found a database that was created in SQL 2000 and had some of these strange auto-created stats. I changed the database to SQL 2000 compatibility mode, deleted one of the auto-created stats, and ran a query to force a new one to be created. The result? A single column stat that did not include the clustering key and that was named using the hex id of the column. In other words, the stat was made using the method we see in newer versions of SQL Server. So the database compatibility mode does not affect this. The only way multi-column auto created stats can be generated is if you are using the SQL Server 2000 (or earlier, most likely) database engine.


Viewing Statistics in SSMS

I’m in the process of creating a post about statistics and came across something in SQL Server Management Studio that was driving me crazy. I thought I’d post it here so that others might not waste an hour or two like I did puzzling over this.

When you create in index in SQL Server, SQL automatically creates a statistic for the columns in the index. In the screenshot below, you can see my table named SJSTest, to which I have added a clustered index called, somewhat unoriginally, ClusteredIndex. When you expand the Statistics folder, you can see SQL has automatically created a statistic with the same name as the index.


The clustered index is composed of Col2, Col1, and Col3:


We expect SQL Server to create a statistic with those columns and in that order. But if we look at the properties of the statistic we see this:


What’s going on? Why are the columns in the statistic not in the same order as the columns in the index? Well, it turns out, they are. If we look on the Details page, we see the density vector is, in fact, created as Col2, Col1, Col3, which is the order of the columns in the index:


At the time, I happened to be flipping back and forth between the property pages of indexes and statistics a lot and this really confused me. In the property page for an index, the columns are listed in the order the are in the index, but on the property page for statistics, the columns are displayed in the order they exist in the table.

I’m using SSMS version 13.0.15600.2. It would be nice if they made in the statistics property page behave the same as the index property page in a future version.

I have created a Connect item for this. If you agree, please vote for it.


SQL 2016 May Need .Net Framework 3.5 After All

I’ve seen a lot of blog posts written about how SQL Server 2016 no longer requires .Net Framework 3.5 to be installed. This seems to be true to some extent; the SQL 2016 install program will no longer abort if it is not installed, like it did in previous versions.

And the SQL installer does not install the .Net Framework 3.5 on its own either. I installed SQL 2016 on three different machines and none of them had the .Net Framework installed before or after the SQL 2016 install completed.

However, I did run into one issue later on regarding database mail. I was able to configure it correctly and was even able to send a test email through SSMS without any errors. Unfortunately, the email was never delivered. Additionally, the database mail logs did not show database mail starting or attempting to send the message. I checked with my Exchange administrator and he said he never saw the SQL Server connecting to the mail server.

A check of the database mail table in msdb (sysmail_allitems) showed my messages sitting there with a status of Unsent.

After troubleshooting for a couple of hours and getting nowhere, we installed in the .Net Framework 3.5 and suddenly database mail started working.

This can be dangerous if you are relying on database mail to send alerts. I received no warnings or error messages when sending an email. All indications were that the mail was sent successfully. I only knew there was an issue because my standard build script sends an email and I knew to expect one when building a new server and I never got one.

I’ve not had much time yet to dig deeper into this and, as I mentioned, I’ve only tried this on 3 SQL Servers, but it has happened consistently on all three and the solution was the same on all three: installed .Net Framework 3.5. The root cause may turn out to be something else, but at least for right now, it seems SQL Server 2016 still needs .Net Framework 3.5 if you want database mail to function.

(And to clarify, I ran into this on three brand new installations. I have not done an in-place upgrade, so this might not occur in that situation.)


Price Reductions On All My Online Courses!

I’m pleased to announce that prices have been permanently reduced on ALL my online SQL Server courses! Additionally, use the links below to save an additional 15% off the newly reduced prices!


Microsoft SQL Server 101 – This course is designed for new DBAs and involuntary DBAs or anyone who needs to learn the basics of SQL Server. The course can be found here at and includes demos and sample scripts. Use this link (or enter the coupon code 15BLOG) and get a 15% discount! This course is over 4 hours long and can be taken at your own pace.

Understanding Data Compression in SQL Server – This course delves into how SQL Server’s data compression feature works and how and when to implement it. The course can be found here at It includes demos and sample scripts. Use this link (or enter the coupon code 15BLOG) and get a 15% discount! This course is approximately 1.5 hours long and can be taken at your own pace.

SQL Server Maintenance Plans – This course will teach you all about SQL Server’s maintenance plans and what they can (and can’t) do. The course can be found here at It includes demos and sample scripts. Use this link (or enter the coupon code 15BLOG) and get a 15% discount! This course is almost 2 hours long and can be taken at your own pace.

Understanding Statistics in SQL Server – This course will teach you all about statistics in SQL Server – what they look like, how they are used, and how to maintain them. I also go over common problems caused by out of data statistics and how to find and fix them. The course can be found here at It includes demos and sample scripts. Use this link (or enter the coupon code 15BLOG) and get a 15% discount! This course is almost 2 hours long and can be taken at your own pace.

Sample videos from my courses can be found on my YouTube channel here.