Enabling Data Compression – August 2013 Update

[UPDATE Oct 2013]: I have made some improvements to this script, including those mentioned in the comments below. See my post on 10/28/13 for the latest version.

 

recently posted an update of my script to enable data compression in SQL Server, but I’ve made some significant changes recently and decided it was time to put another version out there.

What prompted these changes? I was going through my quarterly check of all the databases I watch over and realized that I had several large databases that could benefit from data compression. My servers are increasingly becoming virtualized and that means they are increasingly running the Enterprise edition of SQL Server, which supports data compression. I’ve also not encountered any problems with any databases I’ve compressed so far.  Based on these factors, I decided to start looking into compressing more databases.

But the current script I had to perform the compression wasn’t really elegant. For starters, the stored procedure and logging table had to be located in the database being compressed. This worked fine when I was just compressing one or two databases, but if I’m going to start compressing more databases, I didn’t want to have to put these items in each database. Instead, I wanted to install this procedure in my DBA utility database, a copy of which I have on each server, and be able to run it from there.

I also wanted to change the way the script behaved a little bit. I wanted the new version to be able to operate on multiple databases.

And finally, I cleaned up the code. The first version of the script would check all data, whether it was compressed or not, to see if a different compression method might produce better results. I had previously decided that would likely be a rare case and it created unacceptably long run times, so I changed the script to only look at uncompressed data. But I had left all the old code in place – I just commented it out.I ripped all that stuff out this time around. I also changed the name of the logging table and the name of the actual stored procedure to be more descriptive of what it was actually for.

So, I present to you the latest iteration of my script for enabling data compression! Usage is:

EXEC [up_CompressDatabase]
        @minCompression = .25
       ,@MaxRunTimeInMinutes  = 60
       ,@ExecuteCompressCommand  = 1	/* 1 to execute command */
       ,@DBToScan = '<databasename>' /* database to compress */

@minCompression – the minimum compression ratio that must be reached in order for the data to be compressed. This is a percentage expressed as a decimal – in the example above, the minimum compression is 25%.

@MaxRunTimeInMinutes – the maximum amount of time this procedure will run. This is not a hard and fast limit. It might run slightly longer because the time is only checked after the completion of each command.

@ExecuteCompressCommand – if this is a one, the data will be compressed. If this is a zero, only the compression estimate will be performed and the actual compression commands are printed out instead of being executed.

@DBToScan – this is the name of the database that you want to implement compression in. See Behavior Notes below.

The stored procedure will create a table in the current database named dbCompressionEstimates. This table will contain a list of all compressible objects in the database specified by the @DBToScan parameter, as well as some additional information.

Behavior Notes

Compression can take a long time, so this procedure is written to be able to be run over multiple days, perhaps for a few hours during your maintenance windows. It will pick up where it left off the last time it ran.

The @DBToScan parameter tells the procedure which database to scan for compressible objects. It logs this information to the dbCompressionEstimates table. However, it does NOT limit the actual compression process to that database! The compression portion of the procedure will work to compress objects in the dbCompressionEstimates table, no matter what database they are in.

This was done to allow for the following use case: Suppose you have two databases you want to compress. You have a small window of time during which compression can be performed, so you want to maximize the time spent compressing data. In this situation, you would “preload” the dbCompressionEstimates table by executing the following two commands:

EXEC [up_CompressDatabase]
        @minCompression = .25
       ,@MaxRunTimeInMinutes  = 60
       ,@ExecuteCompressCommand  = 0
       ,@DBToScan = 'database1'
GO
EXEC [up_CompressDatabase]
        @minCompression = .25
       ,@MaxRunTimeInMinutes  = 60
       ,@ExecuteCompressCommand  = 0
       ,@DBToScan = 'database2'
GO

By setting the @ExecuteCompressCommand to zero, the first command will load the dbCompressionEstimates table with objects from database1. The second will load the table with objects from database2. These can usually be run during regular business hours as the estimation routine uses a intent shared lock on the objects it is evaluating (which is the same type of lock your basic SELECT statement uses). (Having said that, let me also point out you should test this first if you have a very busy system that performs lots of inserts, updates, or deletes. Those may experience blocking.) Then, during your maintenance window, you can run either of those two commands again with the @ExecuteCompressCommand set to 1 – it doesn’t matter which one because the routine will see both databases have already been scanned so it will skip that part – and it will  begin compressing data.

The routine compresses the smallest tables first, so it is possible it will bounce around between databases when compressing. The size sort is done irrespective of the database name.

The last column in the dbCompressionEstimates table, StatusText, contains a status of the current row. The possible values are:

Initial load – this indicates the object has been scanned, but no compression estimate or actual compression has been performed yet.

Compression estimate 50% complete – the estimate for page compression only has been completed.

Compression estimate 100% complete – the estimates for both row compression and page compression have been completed.

Best compression method less than minCompression threshold – neither page compression nor row compression would compress the data more than the specified minimum compression ratio passed in to the procedure.

No data in table to compress – the table contains no data and will not have compression performed on it.

Compressing data… – this object is currently being compressed

Compression complete – the object has been compressed.

The table also contains a field named AlreadyProcessed. When this value is zero, the routine has not yet evaluated the object for compression. If it is a one, the procedure has processed that object, either compressing it or not compressing it, based on the reason given in the StatusText field.

If the routine has completely processed a database and you run it again, passing in the same database name as a parameter, it will delete all records pertaining to that database from the dbCompressionEstimates table and rescan the database for uncompressed objects. This is useful if your database has new tables or indexes frequently added to it.

And as always, note that compression is a logged operation, so be sure to monitor your log file usage while this is running.

Here is the code. As with all scripts found on the internet, examine carefully before running on your systems.

I think I’m pretty well pleased with this version. Let me know if you have suggestions for possible improvements.

CREATE PROC [dbo].[up_CompressDatabase]
       (
        @minCompression FLOAT = .25 /* e.g. .25 for minimum of 25% compression */
       ,@MaxRunTimeInMinutes INT = 60
       ,@ExecuteCompressCommand BIT = 0	/* 1 to execute command */
       ,@DBToScan SYSNAME /* database to compress */
       )
AS
       SET nocount ON;

/*
	Original script by Paul Nielsen www.SQLServerBible.com March 13, 2008
	Modified by Shaun J. Stuart www.shaunjstuart.com February 27, 2013

  Sets compression for all objects and indexes in the database needing adjustment
  If estimated gain is equal to or greater than min compression parameter
    then enables row or page compression, whichever gives greater space savings
  If row and page have same gain then, enables row compression
  If estimated gain is less than min compression parameter, then compression is set to none

  - SJS 2/27/13
  - Added @MaxRunTimeInMinutes to limit run length (checked afer each command, so
		may run longer) Note: This timer only applies to the actual compression process.
		It does not limit the time it takes to perform a scan of all the tables / indexes
		and estimate compression savings.
  - Sorted compression cursor to compress smallest tables first
  - Added flag to execute compression command or not
  - Added sort in tempdb option (always)
  - Re-wrote slightly to persist initial scan results. Will only re-scan after all tables
		have been processed

	- SJS 7/17/13
	- Modified to only look at data that is not already compressed
	- Modified for that any items with null none_size, page_size, or row_size are no longer set as
		AlreadyProcessed and now have their sizes calculated (i.e. if process was cancelled before
		initial scan was completed)
	- Modified so that time limit now applies to compression estimate process as well as actual
		compression process
	- Only look at partitions that have at least one row

	- SJS 8/6/13
	- Changed name of dbEstimate table to dbCompressionEstimates for clarity
	- Added StatusText column to dbCompressionEstimates and added status messages
	- Modified to take database name as input parameter to allow use in utility database
		instead of the DB being compressed (Table dbCompressionEstimates, which stores sizes and
		compression estimates, is still created in the current database.)
	- Note: Compression estimates are only made for the database supplied as an input
		parameter. However, that database name is stored in the dbCompressionEstimates table and
		will be read from there and used when actually performing the compression. This allows you to
		create estimates only for multiple databases (using @ExecuteCompressCommand=0), then perform
		the compression	across multiple databases later (with @ExecuteCompressCommand=1).
	- Removed all references to code that checked already compressed data since this routine now only
		looks at uncompressed data.

*/

       IF ISNULL(@DBToScan, '') NOT IN (SELECT    [name]
                                      FROM      sys.databases)
          BEGIN
                SELECT  'Database ' + ISNULL(@DBToScan, 'NULL')
                        + ' not found on server.'
                RETURN
          END

       DECLARE @CompressedCount INT;
       SET @CompressedCount = 0;

       DECLARE @SQL NVARCHAR(MAX);

       DECLARE @StartTime DATETIME2;
       SET @StartTime = CURRENT_TIMESTAMP;

       DECLARE @CurrentDatabase SYSNAME;
       SET @CurrentDatabase = DB_NAME()

       IF OBJECT_ID('tempdb..##ObjEst', 'U') IS NOT NULL
          DROP TABLE ##ObjEst

       CREATE TABLE ##ObjEst
              (
               PK INT IDENTITY
                      NOT NULL
                      PRIMARY KEY
              ,object_name VARCHAR(250)
              ,schema_name VARCHAR(250)
              ,index_id INT
              ,partition_number INT
              ,size_with_current_compression_setting BIGINT
              ,size_with_requested_compression_setting BIGINT
              ,sample_size_with_current_compression_setting BIGINT
              ,sample_size_with_requested_compresison_setting BIGINT
              );

       IF NOT EXISTS ( SELECT   1
                       FROM     sys.objects
                       WHERE    object_id = OBJECT_ID(N'[dbo].[dbCompressionEstimates]')
                                AND type IN (N'U') )
          BEGIN
                CREATE TABLE dbo.dbCompressionEstimates
                       (
                        PK INT IDENTITY
                               NOT NULL
                               PRIMARY KEY
                       ,DatabaseName SYSNAME
                       ,schema_name VARCHAR(250)
                       ,object_name VARCHAR(250)
                       ,index_id INT
                       ,ixName VARCHAR(255)
                       ,ixType VARCHAR(50)
                       ,partition_number INT
                       ,data_compression_desc VARCHAR(50)
                       ,None_Size INT
                       ,Row_Size INT
                       ,Page_Size INT
                       ,AlreadyProcessed BIT
                       ,StatusText VARCHAR(75)
                       );
          END

  /*
 If all objects have been processed, rescan and start fresh. Useful for catching
  added objects since last scan. But beware - this is I/O intensive and can take a while.
*/

       IF NOT EXISTS ( SELECT   1
                       FROM     dbo.dbCompressionEstimates
                       WHERE    AlreadyProcessed = 0
                                AND DatabaseName = @DBToScan )
          BEGIN
                DELETE  FROM dbo.dbCompressionEstimates
                WHERE   DatabaseName = @DBToScan;

                SET @SQL = 'USE [' + @DBToScan + '];
						INSERT   INTO [' + @CurrentDatabase
                    + '].dbo.dbCompressionEstimates
								(DatabaseName
								,schema_name
								,object_name
								,index_id
								,ixName
								,ixType
								,partition_number
								,data_compression_desc
								,AlreadyProcessed
								,StatusText)
                        SELECT  ''' + @DBToScan + '''
								,S.name
                               ,o.name
                               ,I.index_id
                               ,I.name
                               ,I.type_desc
                               ,P.partition_number
                               ,P.data_compression_desc
                               ,0 AS AlreadyProcessed
							   ,''Initial load'' AS StatusText
                        FROM    [' + @DBToScan + '].sys.schemas AS S
                                JOIN [' + @DBToScan
                    + '].sys.objects AS O ON S.schema_id = O.schema_id
                                JOIN [' + @DBToScan
                    + '].sys.indexes AS I ON o.object_id = I.object_id
                                JOIN [' + @DBToScan
                    + '].sys.partitions AS P ON I.object_id = P.object_id
                                                            AND I.index_id = p.index_id
                        WHERE   O.TYPE = ''U''
                                AND P.data_compression_desc = ''NONE''
								AND P.rows > 0;'		-- only look at objects with data
                EXEC (@SQL)
          END
			-- Determine Compression Estimates
       DECLARE @PK INT
              ,@DatabaseName SYSNAME
              ,@Schema VARCHAR(150)
              ,@object VARCHAR(150)
              ,@DAD VARCHAR(25)
              ,@partNO VARCHAR(3)
              ,@indexID VARCHAR(3)
              ,@ixName VARCHAR(250)
              ,@ixType VARCHAR(50)
              ,@Recommended_Compression VARCHAR(10);

       DECLARE cCompress CURSOR FAST_FORWARD
       FOR
               SELECT   schema_name
                       ,object_name
                       ,index_id
                       ,partition_number
                       ,data_compression_desc
               FROM     dbo.dbCompressionEstimates
               WHERE    None_size IS NULL
                        OR Row_Size IS NULL
                        OR Page_size IS NULL;

       OPEN cCompress;

       FETCH cCompress INTO @Schema, @object, @indexID, @partNO, @DAD;

       WHILE @@Fetch_Status = 0
             BEGIN
							/* evaluate objects with no compression */
                   IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes
                      BEGIN
                            IF @DAD = 'none'
                               BEGIN
								/* estimate Page compression */
                                     SET @SQL = 'USE [' + @DBToScan
                                         + '];
                                    INSERT  ##ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = ' + @Schema + '
                                               ,@object_name = ' + @object + '
                                               ,@index_id = ' + @indexID + '
                                               ,@partition_number = '
                                         + @partNO + '
                                               ,@data_compression = ''page'';'
                                     EXEC (@SQL)

                                     UPDATE dbo.dbCompressionEstimates
                                     SET    none_size = O.size_with_current_compression_setting
                                           ,page_size = O.size_with_requested_compression_setting
                                           ,StatusText = 'Compression estimate 50% complete'
                                     FROM   dbo.dbCompressionEstimates D
                                            JOIN ##ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                     DELETE ##ObjEst;

								-- estimate Row compression
                                     SET @SQL = 'USE [' + @DBToScan
                                         + '];
                                    INSERT  ##ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = ' + @Schema + '
                                               ,@object_name = ' + @object + '
                                               ,@index_id = ' + @indexID + '
                                               ,@partition_number = '
                                         + @partNO + '
                                               ,@data_compression = ''ROW'';'
                                     EXEC (@SQL)

                                     UPDATE dbo.dbCompressionEstimates
                                     SET    row_size = O.size_with_requested_compression_setting
                                           ,StatusText = 'Compression estimate 100% complete'
                                     FROM   dbo.dbCompressionEstimates D
                                            JOIN ##ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                     DELETE ##ObjEst;
                               END /* end evaluating objects with no compression */

                            FETCH NEXT FROM cCompress INTO @Schema, @object,
                                  @indexID, @partNO, @DAD
                      END -- end time check block
                   ELSE
                      BEGIN
                            PRINT 'Max runtime reached. No compression performed. Compression estimate still processing. Exiting...';
                            CLOSE cCompress
                            DEALLOCATE cCompress
                            DROP TABLE ##ObjEst
                            RETURN
                      END
             END -- end while loop

       CLOSE cCompress
       DEALLOCATE cCompress

       PRINT 'Initial scan complete.'
         --END
  /* End evaluating compression savings. Now do the actual compressing. */

       PRINT 'Beginning compression.';

  /* Do not process objects that do not meet our compression criteria */

       UPDATE   dbo.dbCompressionEstimates
       SET      AlreadyProcessed = 1
               ,StatusText = 'Best compression method less than minCompression threshold'
       WHERE    (1 - (CAST(Row_Size AS FLOAT) / None_Size)) < @minCompression
                AND (Row_Size <= Page_Size)
                AND None_Size > 0;

       UPDATE   dbo.dbCompressionEstimates
       SET      AlreadyProcessed = 1
               ,StatusText = 'Best compression method less than minCompression threshold'
       WHERE    (1 - (CAST(Page_Size AS FLOAT) / None_Size)) < @minCompression
                AND (Page_Size <= Row_Size)
                AND None_Size > 0;

  /* Do not set compression on empty objects */

       UPDATE   dbo.dbCompressionEstimates
       SET      AlreadyProcessed = 1
               ,StatusText = 'No data in table to compress'
       WHERE    None_size = 0;

 -- set the compression
       DECLARE cCompress CURSOR FAST_FORWARD
       FOR
               SELECT   DatabaseName
                       ,schema_name
                       ,object_name
                       ,partition_number
                       ,ixName
                       ,ixType
                       ,CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression
                                  AND (Row_Size <= Page_Size) THEN 'Row'
                             WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression
                                  AND (Page_Size <= row_Size) THEN 'Page'
                             ELSE 'None'
                        END AS Recommended_Compression
                       ,PK
               FROM     dbo.dbCompressionEstimates
               WHERE    None_Size <> 0
                        AND (CASE WHEN (1 - (CAST(Row_Size AS FLOAT)
                                             / none_Size)) >= @minCompression
                                       AND (Row_Size <= Page_Size) THEN 'Row'
                                  WHEN (1 - (CAST(page_Size AS FLOAT)
                                             / none_Size)) >= @minCompression
                                       AND (Page_Size <= row_Size) THEN 'Page'
                                  ELSE 'None'
                             END <> data_compression_desc)
                        AND AlreadyProcessed = 0
               ORDER BY None_Size ASC;		/* start with smallest tables first */

       OPEN cCompress

       FETCH cCompress INTO @DatabaseName, @Schema, @object, @partNO, @ixName,
             @ixType, @Recommended_Compression, @PK  -- prime the cursor;

       WHILE @@Fetch_Status = 0
             BEGIN

                   IF @ixType = 'Clustered'
                      OR @ixType = 'heap'
                      SET @SQL = 'USE [' + @DatabaseName + '];
							ALTER TABLE ' + @Schema + '.' + @object
                          + ' Rebuild with (data_compression = '
                          + @Recommended_Compression + ', SORT_IN_TEMPDB=ON)';

                   ELSE
                      SET @SQL = 'USE [' + @DatabaseName + '];
							ALTER INDEX ' + @ixName + ' on ' + @Schema + '.'
                          + @object + ' Rebuild with (data_compression = '
                          + @Recommended_Compression + ',SORT_IN_TEMPDB=ON)';

                   IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes
                      BEGIN
                            IF @ExecuteCompressCommand = 1
                               BEGIN
                                     UPDATE dbo.dbCompressionEstimates
                                     SET    StatusText = 'Compressing data...'
                                     WHERE  PK = @PK;

                                     PRINT 'Compressing table/index: '
                                           + @Schema + '.' + @object
                                           + ' in database ' + @DatabaseName;
                                     EXEC sp_executesql
                                        @SQL;

                                     UPDATE dbo.dbCompressionEstimates
                                     SET    AlreadyProcessed = 1
                                           ,StatusText = 'Compression complete'
                                     WHERE  PK = @PK;

                                     SET @CompressedCount = @CompressedCount
                                         + 1;
                               END
                            ELSE
                               BEGIN
                                     PRINT 'Command execution not enabled. Command is:'
                                           + @SQL;
                               END
                      END
                   ELSE
                      BEGIN
                            PRINT 'Max runtime reached. Some compression performed. Exiting...';
                            CLOSE cCompress
                            DEALLOCATE cCompress
                            DROP TABLE ##ObjEst
                            BREAK
                      END

                   FETCH cCompress INTO @DatabaseName, @Schema, @object,
                         @partNO, @ixName, @ixType, @Recommended_Compression,
                         @PK;
             END

       CLOSE cCompress;
       DEALLOCATE cCompress;

       PRINT 'Objects compressed: ' + CONVERT(VARCHAR(10), @CompressedCount);
       DROP TABLE ##ObjEst
       RETURN

10 thoughts on “Enabling Data Compression – August 2013 Update

  1. Thanks Shaun ! Using your tool as we speak. Here’s a snippet of what I just sent management on this: “Once compressed, no disk space saving would occur until database files were shrunk. This introduces fragmentation so index rebuilding would be in order ( although I’m not sure about the fragmentation effect on a netapp )

    We already discussed the fact that with the types of queries we run with multiple joins, it isn’t expected that compression would help query times.

    Lastly, unless Production is compressed, every restore to QA/DEV from prod would require doing this again.”

  2. Seems like a table named “file” caused a issue. Made a change here to see if the square brackets help

    DECLARE cCompress CURSOR FAST_FORWARD
    FOR
    SELECT schema_name
    , ‘[‘ + object_name + ‘]’ as object_name

    1. Yeah, that should do it. I also ran into a similar issue with an index name that had a dash in it. Had to do the same thing with the two ALTER statements and put brackets around the name there.

  3. Hi,

    You should check if some tables are already compress, right? Your script advise me to compress tables which are already compressed!

    BTW a huge thanks for your script! I knew Paul Nielsen script’s, but yours is pretty much awesome!! ;)

    Guillaume

  4. Guillaume – Are you sure about that? Line 165 explicity states to include only tables that have no compression. When I run it, it only selects tables that have no compression. If you clear out the dbCompressionEstimates table and run it again, does it still picked up already compressed tables?

    FYI, I’ve got an updated version of this coming out in a couple weeks with some fixes. There was a bug in the cursor defined at line 185, but I don’t think that would cause selecting tables that are already compressed.

Leave a Reply

Your email address will not be published. Required fields are marked *

I am a real person and can prove it by doing this math problem: (required) Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.