Finding All Gaps In An Identity Column

Have you ever had random inserts into a large table fail? Most of the time, inserts happen fine, but every so often you get a failure with a “primary key violation” error? If your primary key is an integer column with the identity property, you may be wondering how this is possible.

What is likely happening is your table has grown very large or has been in use for a long time and your identity column ran out of numbers. An integer column has a maximum value of 2,147,483,647. Now an integer can start at -2,147,483,648, but most people  start at 0 or 1, so that leaves you with 2 billion numbers.

The first time this happens, someone (maybe you) will reseed the identity column to a new value (zero, in this case):

DBCC CHECKIDENT (YourTable,RESEED,0)

If your table has a lot of records that have been deleted, you might be able to continue on for quite some time before encountering another primary key violation. But this is not a permanent fix – you will encounter this error again eventually.

Another temporary fix, but one that will likely allow you to go for a longer time without encountering the primary key violation again is to reseed the column to the smallest negative number the datatype allows (assuming you have a positively incrementing identity value). For example, to reseed an integer column, you can run:

DBCC CHECKIDENT (YourTable,RESEED,-2147483647)

Assuming you started at zero or 1, this gives you another 2 billion values to use. However, I’ve found most programmers, for some reason, have an aversion to negative numbers as primary keys. There no reason for it, but it’s there, so the DBA has to deal with it.

Of course, the permament fix is to change the data type from integer to BIGINT, but, if your table is big enough to have this problem, that will also present its own set of challenges – this is a “size of data” operation, meaning  SQL has to modify every single row in the table and the time it takes to do this is directly related to the size of the data in the table. The change will require the table to be locked, lots of transaction log will be generated, etc. See Kendra Little’s post about the issues you will face here.

The path of least resistance (and least (or no) downtime) is to reseed the identity column. This is virtually instantaneous. The problem becomes “to what value should I reseed it?”

To answer this question, you need to know where the gaps in your used identity values are. Ideally, you’ll like to reseed to the start of the biggest gap.

I went searching for some code to do this and came across this solution from Microsoft:

-- Use a recursive CTE approach, which will provide
-- all values in all gap ranges
DECLARE @i INT;
SELECT @i = MAX(PK_ID)
FROM YourTable;
WITH tmp (gapId)
AS (SELECT DISTINCT
           a.PK_ID + 1
    FROM YourTable a
    WHERE NOT EXISTS
    (
        SELECT * FROM YourTable b WHERE b.PK_ID = a.PK_ID + 1
    )
          AND a.PK_ID < @i
    UNION ALL
    SELECT a.gapId + 1
    FROM tmp a
    WHERE NOT EXISTS
    (
        SELECT * FROM YourTable b WHERE b.PK_ID = a.gapId + 1
    )
          AND a.gapId < @i)
SELECT gapId
FROM tmp
ORDER BY gapId;
GO

However, when I ran this, I got the following error:

Looks like my table has too many gaps for this to be a valid solution. So back to Google I went and found this really nice solution from Henning Frettem. When I ran his code, I saw my table had 97,236 gaps! So which should I use? Ideally, the largest one! Henning’s code didn’t readily give that, so below is my modified version which gives you that information and sorts from the largest gap to the smallest, since the largest gap is what you are really looking for.

;WITH cte
AS (SELECT PK_ID, /* replace with your identity column */
           RowNum = ROW_NUMBER() OVER (ORDER BY PK_ID)
    FROM dbo.YourTable), /* replace with your table name */
      cte2
AS (SELECT cte.PK_ID,
           cte.RowNum,
           DENSE_RANK() OVER (ORDER BY PK_ID - RowNum) AS Series /* replace with your identity column */
    FROM cte),
      cte3
AS (SELECT cte2.PK_ID,
           cte2.RowNum,
           cte2.Series,
           COUNT(*) OVER (PARTITION BY Series) AS SCount
    FROM cte2),
      cte4
AS (SELECT MinID = MIN(PK_ID), /* replace with your identity column */
           MaxID = MAX(PK_ID), /* replace with your identity column */
           Series
    FROM cte3
    GROUP BY Series)
SELECT GapStart = a.MaxID,
       GapEnd = b.MinID,
       SizeOfGap = b.MinID - a.MaxID
FROM cte4 a
    INNER JOIN cte4 b
        ON a.Series + 1 = b.Series
--ORDER BY GapStart
ORDER BY ABS(b.MinID - a.MaxID) DESC;

In the output screenshot below, I had already reseeded my table to the smallest integer value, so that’s why you see that first big 2 billion row gap. In my case, someone had already reseeded the table to negative numbers in the past, so my developers were OK with using negative numbers.

 

If you encounter this problem and are unable to change the datatype, you now have the code to find the largest gap to use when reseeding your identity value!

 

One thought on “Finding All Gaps In An Identity Column

Leave a Reply

Your email address will not be published. Required fields are marked *

I am a real person and can prove it by doing this math problem: (required) Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.