Yet Another Reason To Dislike Heaps – They Waste Space

I think most DBAs by now know that heaps, tables without a clustered index, are bad and should be avoided, but the word hasn’t seemed to have trickled down to software developers yet. I still come across to many vendor databases that contain hundreds, if not thousands, of heaps. In fact, what prompted me to write this was one database I have to support that has 335,000 total tables, 295,000 of which are heaps. Yes, this is a 1.2 terabyte database that is 88% heaps. (Although, to be fair, most of those heaps have zero records in them.)

I’m in the process of moving the database from SQL 2005 to SQL 2008 R2 so we can implement data compression. I’ve had to get the vendor involved for some parts of the move and they mentioned the database size and that they wanted to run some scripts to purge some old data and reduce the size. (They also wanted to run a shrink, but I put a stop to that. We’ll do a shrink and defrag after the compression has been implemented.)

But given that the vendor took note of the size of the database, I thought this might be a good opportunity to educate them to the perils of heaps. It’s easy to tell them about the performance issues heaps can cause and those are fairly well documented. However, a lesser-known side effect of heaps is that they can unnecessarily increase the size of your database.

If you have a heap that has variable length columns, SQL will stuff as many rows on a page as it can. But then if you go back and modify that data such that the total row length increases, that row will no longer fit in the same space on the data page that it used to. If this happens, SQL creates a forwarding record that 151822725_4cc38cd14c_mpoints to a new page, where the updated, longer row is written. The new row also has a return pointer pointing back to the original position (which is used if the row is updated again). Paul Randal has a great write up about forwarding and back-pointers here, with code to demonstrate exactly what is happening. The upshot of all this is that, due to the forwarding and back pointers, each forwarded record in a heap adds 21 bytes to the table – 12 for the forwarding record pointer and 9 for the return pointer.

That may not seem like a lot, but when you are dealing with a 1.2 terabyte database with 295,000 heaps, I suspected this might add up to a significant chunk of space. So I wrote the below code to see exactly how much space forwarding pointers are using in a database with heaps.

Before getting to the code, let me point out that not all heaps will have forwarding pointers. If your heap is used in in a read-only manner, such as a data warehouse application for example, the records in the heap won’t be updated and no forwarding records would need to be created. Or, if your heap contains only columns of fixed length datatypes, you also won’t have forwarding records. But if your heap is a table that has frequent updates and contains variable length data that changes size, you can have lots of them.

The dynamic management function sys.dm_db_index_physical_stats will tell you how many forwarded records a heap has, provided you run it with the DETAILED option. The code below will scan your database for heaps and write them to a table. It will then run the sm_db_index_physical_stats function against each one to determine the total number of rows and the total number or forwarded records. The last step gives a total, in megabytes, of the total space used by the pointers. It is my hope that providing this data to vendors, along with other information about the other performance issues with heaps, we can start to impress upon the developers the need for clustered indexes.

FYI, for my 1.2 TB database, I had 55 MB of pointer data. I don’t know if this is enough to make any developers look into eliminating heaps in their designs, but it’s worth a try.

Note: Enter your database ID where noted in line 40. Also note, depending on the size of your database, this can take a long time to run. (It took 6 hours on my big database.) I suggest running it during off peak hours or during a maintenance window.

CREATE TABLE ForwardPointers
       (
        ObjectName VARCHAR(50)
       ,ObjectID INT
       ,TotalRows INT
       ,ForwardedRecords INT
       )

/* Load list of heaps into table */

INSERT  INTO ForwardPointers
        (ObjectName)
        SELECT  SCH.name + '.' + TBL.name AS TableName
        FROM    sys.tables AS TBL
                INNER JOIN sys.schemas AS SCH ON TBL.schema_id = SCH.schema_id
                INNER JOIN sys.indexes AS IDX ON TBL.object_id = IDX.object_id
                                                 AND IDX.type = 0 /* Heaps */
/* Get the Object IDs for the heaps */

UPDATE  ForwardPointers
SET     ObjectID = OBJECT_ID(ObjectName)
FROM    ForwardPointers

/* Run each heap through sys.dm_db_index_physical_stats to get the required detais */

DECLARE ObjectCursor CURSOR
FOR
        SELECT  ObjectID
        FROM    ForwardPointers

DECLARE @ObjectID INT

OPEN ObjectCursor
FETCH NEXT FROM ObjectCursor INTO @ObjectID
WHILE @@fetch_status = 0
      BEGIN
            UPDATE  ForwardPointers
            SET     TotalRows = record_count
                   ,ForwardedRecords = forwarded_record_count
            FROM    sys.dm_db_index_physical_stats(EnterYourDatabaseIDHere, @ObjectID, 0, 1, 'DETAILED')
            WHERE   ObjectId = @ObjectID

            FETCH NEXT FROM ObjectCursor INTO @ObjectID
      END

CLOSE ObjectCursor
DEALLOCATE ObjectCursor

/* Calculate the total size of forwarding pointers.
	See http://www.sqlskills.com/blogs/paul/forwarding-and-forwarded-records-and-the-back-pointer-size/
	for details */

SELECT SUM(ForwardedRecords*21) / 1024.0 / 1024.0 AS ForwardPointersSizeInMB
FROM ForwardPointers

--DROP TABLE ForwardPointers

Leave a Reply

Your email address will not be published. Required fields are marked *

I am a real person and can prove it by doing this math problem: (required) Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.