Have you ever had random inserts into a large table fail? Most of the time, inserts happen fine, but every so often you get a failure with a “primary key violation” error? If your primary key is an integer column with the identity property, you may be wondering how this is possible.
What is likely happening is your table has grown very large or has been in use for a long time and your identity column ran out of numbers. An integer column has a maximum value of 2,147,483,647. Now an integer can start at -2,147,483,648, but most people start at 0 or 1, so that leaves you with 2 billion numbers.
The first time this happens, someone (maybe you) will reseed the identity column to a new value (zero, in this case):
DBCC CHECKIDENT (YourTable,RESEED,0)
If your table has a lot of records that have been deleted, you might be able to continue on for quite some time before encountering another primary key violation. But this is not a permanent fix – you will encounter this error again eventually.
Another temporary fix, but one that will likely allow you to go for a longer time without encountering the primary key violation again is to reseed the column to the smallest negative number the datatype allows (assuming you have a positively incrementing identity value). For example, to reseed an integer column, you can run:
DBCC CHECKIDENT (YourTable,RESEED,-2147483647)
Assuming you started at zero or 1, this gives you another 2 billion values to use. However, I’ve found most programmers, for some reason, have an aversion to negative numbers as primary keys. There no reason for it, but it’s there, so the DBA has to deal with it.
Of course, the permament fix is to change the data type from integer to BIGINT, but, if your table is big enough to have this problem, that will also present its own set of challenges – this is a “size of data” operation, meaning SQL has to modify every single row in the table and the time it takes to do this is directly related to the size of the data in the table. The change will require the table to be locked, lots of transaction log will be generated, etc. See Kendra Little’s post about the issues you will face here.
The path of least resistance (and least (or no) downtime) is to reseed the identity column. This is virtually instantaneous. The problem becomes “to what value should I reseed it?”
To answer this question, you need to know where the gaps in your used identity values are. Ideally, you’ll like to reseed to the start of the biggest gap.
I went searching for some code to do this and came across this solution from Microsoft:
-- Use a recursive CTE approach, which will provide
-- all values in all gap ranges
DECLARE @i INT;
SELECT @i = MAX(PK_ID)
WITH tmp (gapId)
AS (SELECT DISTINCT
a.PK_ID + 1
FROM YourTable a
WHERE NOT EXISTS
SELECT * FROM YourTable b WHERE b.PK_ID = a.PK_ID + 1
AND a.PK_ID < @i
SELECT a.gapId + 1
FROM tmp a
WHERE NOT EXISTS
SELECT * FROM YourTable b WHERE b.PK_ID = a.gapId + 1
AND a.gapId < @i)
ORDER BY gapId;
However, when I ran this, I got the following error:
Looks like my table has too many gaps for this to be a valid solution. So back to Google I went and found this really nice solution from Henning Frettem. When I ran his code, I saw my table had 97,236 gaps! So which should I use? Ideally, the largest one! Henning’s code didn’t readily give that, so below is my modified version which gives you that information and sorts from the largest gap to the smallest, since the largest gap is what you are really looking for.
AS (SELECT PK_ID, /* replace with your identity column */
RowNum = ROW_NUMBER() OVER (ORDER BY PK_ID)
FROM dbo.YourTable), /* replace with your table name */
AS (SELECT cte.PK_ID,
DENSE_RANK() OVER (ORDER BY PK_ID - RowNum) AS Series /* replace with your identity column */
AS (SELECT cte2.PK_ID,
COUNT(*) OVER (PARTITION BY Series) AS SCount
AS (SELECT MinID = MIN(PK_ID), /* replace with your identity column */
MaxID = MAX(PK_ID), /* replace with your identity column */
GROUP BY Series)
SELECT GapStart = a.MaxID,
GapEnd = b.MinID,
SizeOfGap = b.MinID - a.MaxID
FROM cte4 a
INNER JOIN cte4 b
ON a.Series + 1 = b.Series
--ORDER BY GapStart
ORDER BY ABS(b.MinID - a.MaxID) DESC;
In the output screenshot below, I had already reseeded my table to the smallest integer value, so that’s why you see that first big 2 billion row gap. In my case, someone had already reseeded the table to negative numbers in the past, so my developers were OK with using negative numbers.
If you encounter this problem and are unable to change the datatype, you now have the code to find the largest gap to use when reseeding your identity value!
My company recently purchased SQL Sentry (oops, they changed the name to Sentry One over two years ago, although the website still says SQL Sentry all over the place) and I’ve been spending the last several weeks setting it up and configuring all the alerts. There are a ton of items it monitors and, while the default thresholds are good, they definitely need tweaking to fit your environment. Unless you enjoy getting thousands of email alerts, that is.
I’ve finally gotten those alerts down to a manageable number and I can start paying attention to them. One alert caught my eye because it was one I hadn’t seen before. I was getting these errors in the Windows Application log:
The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
Rare, huh? I had dozens of these in my logs every day, going back months. But I also had no complaints from people about queries failing. Strange.
So I wanted to track down what was causing these errors. I’ve got Query Store enabled, so my first thought was to start looking there. But the error in my thinking quickly became apparent. The error says SQL Server “could not produce a query plan.” I’m guessing that means I wouldn’t find anything in the Query Store.
So I turned to Extended Events and set up a session looking for error 8623. After running for a short while, I caught one of these errors. That gave me the host name of the computer that was submitting these complex queries. I narrowed my monitoring down to that server and that let me find a couple of queries causing the problems.
The machine in question was one that handles searches for our website. The server was submitting queries that looked like this:
Explicitly including an extremely large number of values (many thousands of values separated by commas) within the parentheses, in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table, and use a SELECT subquery within an IN clause.
Looks like I found the problem. I actually counted the number of items in that IN statement. 1,077, but don’t hold me to that. I sneezed when counting and may have lost my place. There’s definitely enough to qualify as “an extremely large number,” I do believe.
So I contacted our devs and told them the problem. They said this code was generated by a third party app that uses entity framework to update our search indexes. They’ve added a task to move this functionality to an in-house developed stored procedure, but it will take a while to implement that change because it requires some application architecture changes.
So I’m stuck with my error logs filling with these. Bummer.
I know computer-generated SQL is not the most well-written code, but this is the first time I have ever seen some that doesn’t just perform poorly, but actually causes errors. Three cheers for AI!
I have just published my online courses at Skillshare.com. My courses were previously only available at Udemy.com.
At Udemy, courses are sold on a per-course basis. At Skillshare, you sign up for a reasonably priced monthly subscription and you receive access to all the courses on their site – currently more than 17,000. So depending on how you prefer to purchase training, you now have two different options. My courses on the two sites are identical.
My courses are:
Microsoft SQL Server 101 – This course is designed for new DBAs and involuntary DBAs or anyone who needs to learn the basics of SQL Server. The course can be found here at Udemy.com and includes demos and sample scripts. Use this link (or enter the coupon code 15OFFBLOG) and get a 15% discount! This course is over 4 hours long and can be taken at your own pace. If you prefer Skillshare, use this link.
Understanding Data Compression in SQL Server – This course delves into how SQL Server’s data compression feature works and how and when to implement it. The course can be found here at Udemy.com. It includes demos and sample scripts. Use this link (or enter the coupon code 15OFFBLOG) and get a 15% discount! This course is approximately 1.5 hours long and can be taken at your own pace. If you prefer Skillshare, use this link.
SQL Server Maintenance Plans – This course will teach you all about SQL Server’s maintenance plans and what they can (and can’t) do. The course can be found here at Udemy.com. It includes demos and sample scripts. Use this link (or enter the coupon code 15OFFBLOG) and get a 15% discount! This course is almost 2 hours long and can be taken at your own pace. If you prefer Skillshare, use this link.
Sample videos from my courses can be found on my YouTube channel here.
I ran into a problem with Availability Group backups the other day that had me scratching my head for a couple of hours before I finally discovered what was going on. When you set up an Availability Group, one of the settings you can configure is where you want the backups to be made. The configuration screen looks like this:
I think the options are fairly self-explanatory (perhaps with the exception of the Backup Priority value for the last option, but that’s not important to this story). The default, and the way my AG was configured, was Prefer Secondary. As the image shows, this means backups will be made on the secondary, unless the secondary is unavailable, in which case, they will be made on the primary.
There are a couple of things to note when you use this setting:
Full backups made on the secondary are Copy Only backups. This means they won’t reset the differential bitmap and your differentials will continue to increase in size until a full backup is made on the primary.
Differential backups cannot be made on the secondary.
Transaction log backups can be made on the secondary and they do clear the log, so your log file will not continue to grow.
One morning we had a situation where our differential backups were failing and we were trying to track down the cause. We have a centralized job server that runs jobs on several of our SQL Servers. This is accomplished by supplying the server name to a PowerShell script, which then executes the command against the specified server. The PowerShell backup script used the AG listener name when executing the command, so the command would always be sent to the primary replica.
This is where we started to get confused. According to the way the AG backup preferences were configured, SQL should run the backup on the secondary. But the backup command called for a differential, which can’t be run on a secondary. Well, the job was failing, so that made sense to me. What didn’t make sense was the fact that the job had run successfully for months in the past. Also, our full backups were resetting the differential bitmap because I could see any differential taken after a full backup decreased in size from the one taken before the full backup. How was that possible?
I went back and looked in the msdb backupset table for the last couple of months. This table has two columns – server_name and machine_name – that give the name of the machine the backup was made on. In my case, I found the name was always the name of the primary replica.
What’s going on? It looked like SQL was ignoring my AG backup preference setting. After much head-scratching and testing, I finally discovered something: those settings don’t mean crap! Well, I mean, they do, but it’s up to you to make use of them. In the Books Online documentation, I found this little gem:
There is no enforcement of the automated backup preference setting. The interpretation of this preference depends on the logic, if any, that you script into backup jobs for the databases in a given availability group. The automated backup preference setting has no impact on ad-hoc backups. (emphasis mine)
Great. Way to make that clear in the UI.
Way down at the bottom of that BOL page, it tells you that YOU are responsible to determine where your backup should be run and you do this by checking the sys.fn_hadr_backup_is_preferred_replica function. So basically, that configuration page where you select the Backup Preferences only tells SQL Server what value that function should return. It doesn’t actually make the backup happen on a certain replica.
Because our backup job did not check that function, backups were always being made on the primary.
The 16.5.2 release of SQL Server Management Studio (which reports as 13.0.16105.4 in the About screen) has a bug where you cannot display a list of tables in databases that contain sparse columns. You get a “Subquery returned more than 1 value.” error as shown below when you try to expand the Tables node.