Online Courses Now Available At Skillshare.com

I have just published my online courses at Skillshare.com. My courses were previously only available at Udemy.com.

At Udemy, courses are sold on a per-course basis. At Skillshare, you sign up for a reasonably priced monthly subscription and you receive access to all the courses on their site – currently more than 17,000.  So depending on how you prefer to purchase training, you now have two different options. My courses on the two sites are identical.

My courses are:

Microsoft SQL Server 101 – This course is designed for new DBAs and involuntary DBAs or anyone who needs to learn the basics of SQL Server. The course can be found here at Udemy.com and includes demos and sample scripts. Use this link (or enter the coupon code 15OFFBLOG) and get a 15% discount! This course is over 4 hours long and can be taken at your own pace. If you prefer Skillshare, use this link.

Understanding Data Compression in SQL Server – This course delves into how SQL Server’s data compression feature works and how and when to implement it. The course can be found here at Udemy.com. It includes demos and sample scripts. Use this link (or enter the coupon code 15OFFBLOG) and get a 15% discount! This course is approximately 1.5 hours long and can be taken at your own pace. If you prefer Skillshare, use this link.

SQL Server Maintenance Plans – This course will teach you all about SQL Server’s maintenance plans and what they can (and can’t) do. The course can be found here at Udemy.com. It includes demos and sample scripts. Use this link (or enter the coupon code 15OFFBLOG) and get a 15% discount! This course is almost 2 hours long and can be taken at your own pace. If you prefer Skillshare, use this link.

Sample videos from my courses can be found on my YouTube channel here.

Are Your Availability Group Backups Really Following Your Configured Backup Preferences?

I ran into a problem with Availability Group backups the other day that had me scratching my head for a couple of hours before I finally discovered what was going on. When you set up an Availability Group, one of the settings you can configure is where you want the backups to be made. The configuration screen looks like this:

I think the options are fairly self-explanatory (perhaps with the exception of the Backup Priority value for the last option, but that’s not important to this story). The default, and the way my AG was configured, was Prefer Secondary. As the image shows, this means backups will be made on the secondary, unless the secondary is unavailable, in which case, they will be made on the primary.

There are a couple of things to note when you use this setting:

  1. Full backups made on the secondary are Copy Only backups. This means they won’t reset the differential bitmap and your differentials will continue to increase in size until a full backup is made on the primary.
  2. Differential backups cannot be made on the secondary.
  3. Transaction log backups can be made on the secondary and they do clear the log, so your log file will not continue to grow.

One morning we had a situation where our differential backups were failing and we were trying to track down the cause. We have a centralized job server that runs jobs on several of our SQL Servers. This is accomplished by supplying the server name to a PowerShell script, which then executes the command against the specified server. The PowerShell backup script used the AG listener name when executing the command, so the command would always be sent to the primary replica.

This is where we started to get confused. According to the way the AG backup preferences were configured, SQL should run the backup on the secondary. But the backup command called for a differential, which can’t be run on a secondary. Well, the job was failing, so that made sense to me. What didn’t make sense was the fact that the job had run successfully for months in the past. Also, our full backups were resetting the differential bitmap because I could see any differential taken after a full backup decreased in size from the one taken before the full backup. How was that possible?

I went back and looked in the msdb backupset table for the last couple of months. This table has two columns – server_name and machine_name – that give the name of the machine the backup was made on. In my case, I found the name was always the name of the primary replica.

What’s going on? It looked like SQL was ignoring my AG backup preference setting. After much head-scratching and testing, I finally discovered something: those settings don’t mean crap! Well, I mean, they do, but it’s up to you to make use of them. In the Books Online documentation, I found this little gem:

Important
There is no enforcement of the automated backup preference setting. The interpretation of this preference depends on the logic, if any, that you script into backup jobs for the databases in a given availability group. The automated backup preference setting has no impact on ad-hoc backups. (emphasis mine)

Great. Way to make that clear in the UI.

Way down at the bottom of that BOL page, it tells you that YOU are responsible to determine where your backup should be run and you do this by checking the sys.fn_hadr_backup_is_preferred_replica function. So basically, that configuration page where you select the Backup Preferences only tells SQL Server what value that function should return. It doesn’t actually make the backup happen on a certain replica.

Because our backup job did not check that function, backups were always being made on the primary.

The more you know…

Management Studio release 16.5.2 Has A Bug Preventing Showing List of Tables

The 16.5.2 release of SQL Server Management Studio (which reports as 13.0.16105.4 in the About screen) has a bug where you cannot display a list of tables in databases that contain sparse columns. You get a “Subquery returned more than 1 value.” error as shown below when you try to expand the Tables node.

 

This has been reported here and here.

Upvote the Microsoft Connect Item here.

The fix is to roll back to a previous version.

Removing Duplicate Statistics

It’s a SQL Server best practice to have Auto Create Statistics enabled on your databases. When this is on, SQL Server will create statistics on single columns when it needs them. However if someone comes along later and adds an index, that auto-created statistic might not be needed any longer.

The Potential Problem

When an index is created, SQL creates a statistic on the columns in that index. If the first column in that index was a column that SQL had previously auto-created a statistic on, you’ve now got duplicate statistics on that column. Is this bad? Maybe.

A duplicate statistic can impact storage space and performance. Statistics are relatively small objects in a database – they contain a maximum of 200 rows in the histogram and a very small amount of data in the density vector and header. Therefore, the impact on storage space is very minimal, but definitely not non-existent.

The biggest impact of duplicate statistics would be manifested as decreased performance. We all know too many indexes can slow down table inserts, deletes, and updates because SQL has to update all those indexes each time data is modified. The same situation exists with statistics. The more there are, the more SQL has to modify as data in the table changes. It’s not quite as bad as indexes because the number of data modifications has to reach a certain threshold before statistics are updated, whereas for indexes every single data modification of a column used by an index requires an index update. Nevertheless, the potential exists for SQL to have to do more work than it needs to when you’ve got duplicated statistics.

But that’s not the only way performance can be affected. As Erin Stellato points out in the comments to her blog post, it’s possible that SQL Server will pick a column statistic over an index statistic or vice-versa when generating a query plan. If one of these stats is out of date and that is the one SQL picks, a less-than-optimal query plan might be generated.

So it may be worth checking your databases for duplicate statistics, especially if you have a highly active database that you have recently added indexes to.

“That’s Strange…”

Being the good DBA that I am, I decided to do this check on my SQL Servers. I found a handy script at https://mattsql.wordpress.com/2013/08/28/duplicate-statistics/. The code looked ok, so I ran it against one of my smaller databases and removed the statistics it said were duplicates. I re-ran the query and, sure enough, it returned no rows, meaning it found no more duplicates. Then I saw something shiny and got distracted.

When my focus returned to SQL, I ran the query again – accidentally against the same database I ran it against earlier. Strangely, this time it reported it found some duplicate statistics. Hmm.

That shouldn’t have happened. If SQL auto-created statistics again, it meant that it needed them – which meant that the stats on the indexes I thought were duplicated were not actually duplicates.

I puzzled on this for a bit and got sidetracked by the strange way SSMS displays statistics columns on the Property page. Then it got to be the end of the day and I went home. The next day, I had a comment on my previous post from Aaron Bertrand who mentioned there is a related bug with the stats_column_id column of the sys.stats_columns view. It does not contain what the MSDN documentation says it contains. The Connect item for this, along with a workaround, can be found here.

The script I was using did not reference that column, but it did get me thinking that perhaps the script was not correctly identifying the first column in an index.

What The Script Does

In looking at the script, I could see it looks for any statistics created for indexes. Through a join to the sys.stats table, the script selects any auto-created statistics on the same column that is the first column in an index. The key line is in the WHERE clause of the CTE:

and ic.index_column_id = 1

You would think that would select the first column in the index. That’s even what you might expect from reading the MSDN page for the view. Except that’s not always the case. If you want to find the first column in the index, you have to select the record where key_ordinal = 1, not index_column_id =1.

A Short Example To Illustrate This

Let’s go to a database we can mess around in and see if we can prove this. First, let’s create a table and a clustered index. (I chose this structure because it is the same as the table that caused me to see this problem in the first place.)

 


CREATE TABLE [dbo].[IndexStatTest]
    (
      [ID] [INT] IDENTITY(1, 1)
                 NOT NULL ,
      [TypeLID] [INT] NOT NULL ,
      [Hidden] [TINYINT] NOT NULL ,
      [ObjectID] [INT] NULL ,
      [ObjectTypeLID] [INT] NULL ,
      [TMGUID] [CHAR](32) NOT NULL ,
      [LMG] [CHAR](32) NOT NULL ,
      [LMD] [DATETIME] NOT NULL ,
      [LMU] [NVARCHAR](100) NOT NULL ,
      [IsVisibleInCM] [TINYINT] NOT NULL ,
      CONSTRAINT [PK_IndexStatTest_ID] PRIMARY KEY NONCLUSTERED ( [ID] ASC )
        ON [PRIMARY]
    )
ON  [PRIMARY];

GO


CREATE UNIQUE CLUSTERED INDEX [UK_IndexStatTest_02] ON [dbo].[IndexStatTest]
(
[ObjectID] ASC,
[ObjectTypeLID] ASC,
[TypeLID] ASC
) ON [PRIMARY];
GO

Clearly, we defined the index with ObjectID as the first column. Now, let’s run a query that joins the sys.index_columns view with sys.indexes and sys.objects (so we can see some names instead of just object IDs) and see what we get.

 


SELECT  ic.object_id ,
        ic.index_id ,
        ic.index_column_id ,
        ic.column_id ,
        ic.key_ordinal ,
        COL_NAME(o.object_id, ic.column_id) AS ColName ,
        i.name AS IndexName ,
        OBJECT_NAME(ic.object_id) AS TableName
FROM    sys.indexes i
        JOIN sys.objects o ON o.object_id = i.object_id
        JOIN sys.index_columns ic ON ic.index_id = i.index_id
                                     AND ic.object_id = i.object_id
WHERE   i.name = 'UK_IndexStatTest_02';

The results:

IndexColumnPositions

Recall the column order we defined for our index: ObjectID, ObjectTypeLID, and TypeLID. We can see that index_column_id does not number the columns in that order. It is the numbering in the key_ordinal column that indicates the order the columns are defined in the index.

This explains the behavior I was seeing. Because the script I used incorrectly identified the initial column the index statistic was created on, it incorrectly identified an auto-created statistic as a duplicate. Luckily, because I have auto create statistics enabled, I didn’t cause much chaos when I deleted the incorrect statistic – SQL Server just made a new one when it needed it.

One More Issue

In testing, I also ran across one more issue – namely auto-created statistics that have more than one column. I know this is supposedly not possible, but I discovered a weird case where it can happen. I wrote about it in some detail last week, so check that out if you are interested. Bottom line is this situation might occur if you are working with a database that was created in SQL 2000 (or possibly earlier).

What this meant for my script is that, due to the join on the sys.stats_columns table, some statistics might be incorrectly flagged as being duplicated when they really aren’t. Again, this would only happen in the strange case where you were working with an old database.

In deciding how to handle this, I came to the conclusion that I would just let those stats be selected for deletion anyway. Because auto-create statistics is enabled in the database, if any deleted stat is needed, SQL will go ahead and re-create it. (If auto-create statistics is not enabled in the database, there would be no reason to run this script in the first place.)

Nevertheless, if you are concerned about this, perhaps because you know performance in your environment is highly dependent on statistics and the overhead of auto-creating them in production could cause issues, look over the script output before running it. Any suspect statistics would be named _WA_Sys_<column name>_<hex ID of table> instead of the newer format _WA_Sys_<hex id of column>_<hex ID of table>. Don’t run the commands with the column name instead of the hex ID. Instead, run those commands during an outage window and then manually re-create the statistics, taking care to identify which column that stat is for and which ones were the included clustering keys. This will re-create those stats as the newer format single column stats.

The Corrected And Improved Script

I took Matt’s script as a starting point and I made the correction to the CTE. Because I wanted to run this script against a bunch of databases, I also modified it to cycle through all the databases on the SQL Server instance. I also changed it to output the commands to drop the duplicate statistics.

The script below will cycle through all user databases on a server, looking for auto-created statistics that are on a column that is also the first column in an index on the table. It ignores disabled and filtered indexes and will output a list of commands that drops the duplicate statistics. You can then take this output and run it against your server to remove these statistics. The script itself makes no changes to any database. If you want to see the details of which statistic is duplicated by which index, simply uncomment out the second to last SELECT statement in the script.

This script was tested against SQL 2012. I suspect it may not work with partitioned indexes – I did not test it against those.

Also note that this script only finds duplicate statistics between stats that were auto created and those from an index. It will not find duplicate between two column statistics. For example, if SQL auto created a stat on a column and someone later manually created one on the same column, this script would not find this.

This script has been tested against SQL Server 2008 R2, 2012, SQL 2014, and SQL 2016. It does NOT work on SQL 2005 and earlier.

/*
	Script to find auto-created statistics that are no longer needed due
	to an index existing with the first column the same as the column
	the statistic is on. Disabled indexes are ignored. Based on script from
	mattsql.wordpress.com/2013/08/28/duplicate-statistics/

	The script will loop through all the user databases on a server and
	output a list of commands to drop the duplicate	statistics. For more
	details on the duplicates, uncomment the second to the last SELECT
	statement.
	
	Disabled indexes are ignored.
	Filtered indexes are ignored.

	This script makes no changes to any databases.

	ShaunJStuart.com

	8-8-16	Initial release

*/

DECLARE @SQLCmd VARCHAR(MAX);
DECLARE @LoopCounter INT;
DECLARE @WorkingDB sysname;

IF OBJECT_ID('tempdb..##UnneededStats') IS NOT NULL
    DROP TABLE ##UnneededStats;

CREATE TABLE ##UnneededStats
    (
      [ID] INT ,
      [Schema] sysname NOT NULL ,
      [Table] sysname NOT NULL ,
      [Column] sysname NOT NULL ,
      [AutoCreatedStatistic] sysname NOT NULL ,
      [Index] sysname NOT NULL ,
      [DropStatCommand] VARCHAR(4000) NOT NULL
    );


DECLARE @DatabasesTable TABLE ( DBName sysname );

INSERT  INTO @DatabasesTable
        SELECT  name
        FROM    sys.databases
        WHERE   source_database_id IS NULL	/* no database snapshots */
                AND is_read_only = 0		/* no read-only dbs (log shipping) */
                AND state = 0				/* online databases */
                AND name NOT IN ( 'master', 'model', 'msdb', 'tempdb' )		/* non-system databases only */
        ORDER BY name;

SELECT  @LoopCounter = COUNT(*)
FROM    @DatabasesTable;

WHILE @LoopCounter > 0
    BEGIN

        SELECT TOP 1
                @WorkingDB = DBName
        FROM    @DatabasesTable;

        SET @SQLCmd = '
USE [' + @WorkingDB
            + '];
WITH    stats_on_indexes ( [object_id], [table_column_id], [index_name] )
          AS ( SELECT   o.[object_id] AS [object_id] ,
                        ic.[column_id] AS [table_column_id] ,
                        i.name
               FROM     sys.indexes i
                        JOIN sys.objects o ON i.[object_id] = o.[object_id]
                        JOIN sys.stats st ON i.[object_id] = st.[object_id]
                                             AND i.name = st.name
                        JOIN sys.index_columns ic ON i.index_id = ic.index_id
                                                     AND i.[object_id] = ic.[object_id]
               WHERE    o.is_ms_shipped = 0
                        AND i.has_filter = 0
                        AND ic.key_ordinal = 1
                        AND i.is_disabled = 0
             )
    INSERT  INTO ##UnneededStats
            ( [ID] ,
              [Schema] ,
              [Table] ,
              [Column] ,
              [AutoCreatedStatistic] ,
              [Index] ,
              [DropStatCommand]
            )
            SELECT  o.[object_id] AS [ID] ,
                    sch.name AS [Schema] ,
                    o.name AS [Table] ,
                    c.name AS [Column] ,
                    s.name AS [AutoCreatedStatistic] ,
                    stats_on_indexes.index_name AS [Index] ,
                    ''USE [' + @WorkingDB
            + ']; DROP STATISTICS ['' + sch.name
                    + ''].['' + o.name + ''].['' + s.name + '']'' AS DropStatCommand
            FROM    sys.stats s
                    JOIN sys.stats_columns sc ON s.stats_id = sc.stats_id
                                                 AND s.[object_id] = sc.[object_id]
                    JOIN sys.objects o ON sc.[object_id] = o.[object_id]
                    JOIN sys.schemas sch ON o.schema_id = sch.schema_id
                    JOIN sys.columns c ON sc.[object_id] = c.[object_id]
                                          AND sc.column_id = c.column_id
                    JOIN stats_on_indexes ON o.[object_id] = stats_on_indexes.[object_id]
                                             AND stats_on_indexes.table_column_id = c.column_id
            WHERE   s.auto_created = 1
                    AND s.has_filter = 0;';

        EXEC(@SQLCmd);

        DELETE  FROM @DatabasesTable
        WHERE   DBName = @WorkingDB;

        SET @LoopCounter -= 1;

    END;

/* 
SELECT  *
FROM    ##UnneededStats;
*/

SELECT DISTINCT
        DropStatCommand
FROM    ##UnneededStats;

/* DISTINCT is needed because sometimes more than one index will start
	with the same column, resulting in two DROP statements being generated
	for the same statistic, which would cause an error when executed. */

DROP TABLE ##UnneededStats;

The Case Of The Multi-Column Auto-Created Statistic

I’ve been spending some time lately working with statistics, specifically auto-created statistics. When auto-create statistics is enabled, SQL Server will create column statistics when it needs them. No surprises there. Conventional wisdom also states that SQL Server never creates multi-column statistics, only single column statistics. In fact, this MSDN page says this of the auto create statistics option:

“It applies strictly to single-column statistics for the full table.”

Seems pretty clear. SQL Server will not automatically create multi-column statistics. I’ve heard this over and over from many different sources.

So imagine my surprise when, while testing some code, I came upon entries in the sys.stats_column system table that looked like this:

MultiColAutoCreated

Wow.. That sure looks like three auto-created, multi-column statistics! We have three stats: stats_ids 3, 4, and 5. The sys.stats_column table contains one row for each column that is in a statistic, so multiple rows for a single statistic (i.e., a single stats_id value), indicate multiple columns in that stat. Indeed, the column_id values indicate the table columns contained the stat. So stats_id 3 contains columns VersionMajor and ApplicationID (column_ids 3 and 1), stats_id 4 contains columns VersionMinor and ApplicationID (column_ids 4 and 1), and stats_id 5 contains columns VersionRevision and ApplicationID (column_ids 5 and 1). And, clearly, the auto_created flag is true, so these three stats were auto-created. What’s going on?

I puzzled over this for quite some time got nowhere. I finally reached out to the SQL Server community for help. Erin Stellato (b | t) said she had never seen that before, but she did mention the naming of the statistics seemed strange. I noticed this as well. As Paul Randal has blogged, auto-created statistics are named as _WA_Sys_<hex value of column>_<hex value of table>. Typically, the names look something like _WA_Sys_0000000D_00000029.

But these stats don’t follow that format. Instead of the hex value for the column, they have the column name.

A couple hours later, Paul White (b | t) responded to my tweet:

PWTweet

Aha! It seemed highly likely that this was the cause of what I was seeing. Although I was working with a database on a SQL 2012 server, I was pretty sure that database was originally created on an older version of SQL. I wanted to confirm this however.

To verify what SQL version the database was created with, I used the DBCC DBINFO command. The value of the dbi_createVersion field indicates what SQL version was used to create the database.

dbccdbinfo

Thanks to Paul Randal’s TechNet post, we know that a value of 539 means the database was created with SQL 2000. Looks like Paul White might be on to something.

The next piece I wanted to verify was that auto create stats for SQL 2000 included the table’s clustering key. Unfortunately, deep dive details for auto-create stats for SQL 2000 are somewhat hard to come by these days and I was not able to find anything that definitely stated the clustering key was included. However, in all the auto-created stats in all the old databases I found on my servers, this was the case. I also verified that if the clustering key was more than one column, all the clustering columns were included in the auto created stats. I’m going to go ahead and assume this is the way it worked back in the day.

And finally, I wanted to find something confirming the naming convention. Again, details were hard to find, but I did find this ebook which states that the names of the auto-created statistics in SQL 2000 use the column name, not the hex value of the column.

This is enough evidence for me, so I’m going to call it: I’m going to go ahead and say that auto created column statistics can be created on multiple columns. Granted, you have to be running SQL 2000 and the additional columns are always the clustering key columns, so it’s very specific case and it no longer happen in modern versions of SQL Server.

Still, if you are ever asked if SQL Server will automatically create multi-column statistics, you can now give the classic SQL Server answer: It depends :-)

If SQL doesn’t behave this way anymore, why is this important? This issue caught my eye when I was writing some code to examine the statistics tables and when I joined to the sys.stats_columns table, I was getting back more rows than I expected. Because I was joining to other tables as well, this messed up my final result set. If you’ve got databases that have been migrated from older versions of SQL, you might run into this problem as well. Come back next week for the story on that…

 

P.S. Just for kicks, I deleted one of the old, two column stats and ran a query so that a new stat would be created. Would the new statistic follow the old or new SQL Server rule? Turns out, not surprisingly, the newly auto-created stat followed the new rules: it was single column, did not include the clustering key, and followed the new naming convention using the hex value for the column name. This was tested using SQL 2012.

I also wanted to see what would happen if the database was running in SQL 2000 compatibility mode. On a SQL 2005 server, I found a database that was created in SQL 2000 and had some of these strange auto-created stats. I changed the database to SQL 2000 compatibility mode, deleted one of the auto-created stats, and ran a query to force a new one to be created. The result? A single column stat that did not include the clustering key and that was named using the hex id of the column. In other words, the stat was made using the method we see in newer versions of SQL Server. So the database compatibility mode does not affect this. The only way multi-column auto created stats can be generated is if you are using the SQL Server 2000 (or earlier, most likely) database engine.