Enabling Data Compression – Updated

Back in March, I wrote about a method I use to enable data compression on a database and periodically

Building a better compressor
Building a better compressor

re-scan for new tables to compress. This method worked fine on the database I was working with at the time. However, I have since tried to run this on a much larger database and the performance was less than optimal.

The database I am now working with has 200,000+ tables. The old script was taking over 2 days to iterate through all these tables and get the estimated compression values. This posed a couple of problems. First, because I want to run this on a weekly basis to catch any new tables that might be added, that length of time was not acceptable. Second, my initial script featured a parameter to limit how long the script ran, but it did not apply to the initial table scan and compression estimate section. Third, if the script was canceled while the compression estimates were still being performed, a re-run of the script would not pick up where it left off – it would simply ignore the tables that had no estimates and start compressing those that did have estimates.

As I’ve mentioned previously, creating maintenance scripts is an iterative process. So I made a couple changes:

  1. Time limit now applies to entire process. Upon exit, it will print a message saying how many tables were compressed or, if it did not reach the compression stage, that it is still generating compression estimates for the tables.
  2. Will resume generating compression estimates for tables if there are some that still need them generated.
  3. Will only look at tables or indexes that have at least one row of data.
  4. Will only look at tables or indexes that are currently not compressed.

One note about item 4 – the original version of the script evaluated all tables or indexes, even if they were already compressed. The thinking was that it was possible that an object was initially compressed with one method (page or row compression) and now it might do better using the other method. I figure this is a fairly rare case and, if it did happen, the difference between methods would probably be minimal. So in the interest of speeding things up, the script now only looks at non-compressed objects. I did however, leave the old code in place and just commented it out, in case I want to put it back in later.

Note that in order to use this, you will also need to code to create the db_compression_estimate stored procedure, which can be found here.

 

CREATE PROC [dbo].[db_compression]
      (
       @minCompression FLOAT = .25 /* e.g. .25 for minimum of 25% compression */
      ,@MaxRunTimeInMinutes INT = 60
      ,@ExecuteCompressCommand BIT = 0	/* 1 to execute command */
      )
AS
      SET nocount ON;

/*
	Original script by Paul Nielsen www.SQLServerBible.com March 13, 2008
	Modified by Shaun J. Stuart www.shaunjstuart.com February 27, 2013

  Sets compression for all objects and indexes in the database needing adjustment
  If estimated gain is equal to or greater than min compression parameter
    then enables row or page compression, whichever gives greater space savings
  If row and page have same gain then, enables row compression
  If estimated gain is less than min compression parameter, then compression is set to none

  - SJS 2/27/13
  - Added @MaxRunTimeInMinutes to limit run length (checked afer each command, so
		may run longer) Note: This timer only applies to the actual compression process.
		It does not limit the time it takes to perform a scan of all the tables / indexes
		and estimate compression savings.
  - Sorted compression cursor to compress smallest tables first
  - Added flag to execute compression command or not
  - Added sort in tempdb option (always)
  - Re-wrote slightly to persist initial scan results. Will only re-scan after all tables
		have been processed

	- SJS 7/17/13
	- Modified to only look at data that is not already compressed
	- Modified for that any items with null none_size, page_size, or row_size are no longer set as
		AlreadyProcessed and now have their sizes calculated (i.e. if process was cancelled before
		initial scan was completed)
	- Modified so that time limit now applies to compression estimate process as well as actual
		compression process
	- Only look at partitions that have at least one row

*/

      DECLARE @CompressedCount INT;
      SET @CompressedCount = 0;

      DECLARE @StartTime DATETIME2;
      SET @StartTime = CURRENT_TIMESTAMP;

      CREATE TABLE #ObjEst
             (
              PK INT IDENTITY
                     NOT NULL
                     PRIMARY KEY
             ,object_name VARCHAR(250)
             ,schema_name VARCHAR(250)
             ,index_id INT
             ,partition_number INT
             ,size_with_current_compression_setting BIGINT
             ,size_with_requested_compression_setting BIGINT
             ,sample_size_with_current_compression_setting BIGINT
             ,sample_size_with_requested_compresison_setting BIGINT
             );

      IF NOT EXISTS ( SELECT    1
                      FROM      sys.objects
                      WHERE     object_id = OBJECT_ID(N'[dbo].[dbEstimate]')
                                AND type IN (N'U') )
         BEGIN
               CREATE TABLE dbEstimate
                      (
                       PK INT IDENTITY
                              NOT NULL
                              PRIMARY KEY
                      ,schema_name VARCHAR(250)
                      ,object_name VARCHAR(250)
                      ,index_id INT
                      ,ixName VARCHAR(255)
                      ,ixType VARCHAR(50)
                      ,partition_number INT
                      ,data_compression_desc VARCHAR(50)
                      ,None_Size INT
                      ,Row_Size INT
                      ,Page_Size INT
                      ,AlreadyProcessed BIT
                      );
         END

  /*
 If all objects have been processed, rescan and start fresh. Useful for catching
  added objects since last scan. But beware - this is I/O intensive and can take a while.
*/

      IF NOT EXISTS ( SELECT    1
                      FROM      dbEstimate
                      WHERE     AlreadyProcessed = 0 )
         BEGIN
               DELETE   FROM dbEstimate;

               INSERT   INTO dbEstimate
                        (schema_name
                        ,object_name
                        ,index_id
                        ,ixName
                        ,ixType
                        ,partition_number
                        ,data_compression_desc
                        ,AlreadyProcessed)
                        SELECT  S.name
                               ,o.name
                               ,I.index_id
                               ,I.name
                               ,I.type_desc
                               ,P.partition_number
                               ,P.data_compression_desc
                               ,0 AS AlreadyProcessed
                        FROM    sys.schemas AS S
                                JOIN sys.objects AS O ON S.schema_id = O.schema_id
                                JOIN sys.indexes AS I ON o.object_id = I.object_id
                                JOIN sys.partitions AS P ON I.object_id = P.object_id
                                                            AND I.index_id = p.index_id
                        WHERE   O.TYPE = 'U'
                                AND P.data_compression_desc = 'NONE'
								AND P.rows > 0;		-- only look at objects with data
         END
			-- Determine Compression Estimates
      DECLARE @PK INT
             ,@Schema VARCHAR(150)
             ,@object VARCHAR(150)
             ,@DAD VARCHAR(25)
             ,@partNO INT
             ,@indexID INT
             ,@ixName VARCHAR(250)
             ,@SQL NVARCHAR(MAX)
             ,@ixType VARCHAR(50)
             ,@Recommended_Compression VARCHAR(10);

      DECLARE cCompress CURSOR FAST_FORWARD
      FOR
              SELECT    schema_name
                       ,object_name
                       ,index_id
                       ,partition_number
                       ,data_compression_desc
              FROM      dbEstimate
              WHERE     None_size IS NULL
                        OR Row_Size IS NULL
                        OR Page_size IS NULL;

      OPEN cCompress;

      FETCH cCompress INTO @Schema, @object, @indexID, @partNO, @DAD;

      WHILE @@Fetch_Status = 0
            BEGIN
							/* evaluate objects with no compression */
                  IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes
                     BEGIN
                           IF @DAD = 'none'
                              BEGIN
								/* estimate Page compression */
                                    INSERT  #ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = @Schema
                                               ,@object_name = @object
                                               ,@index_id = @indexID
                                               ,@partition_number = @partNO
                                               ,@data_compression = 'page';

                                    UPDATE  dbEstimate
                                    SET     none_size = O.size_with_current_compression_setting
                                           ,page_size = O.size_with_requested_compression_setting
                                    FROM    dbEstimate D
                                            JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                    DELETE  #ObjEst;

								-- estimate Row compression
                                    INSERT  #ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = @Schema
                                               ,@object_name = @object
                                               ,@index_id = @indexID
                                               ,@partition_number = @partNO
                                               ,@data_compression = 'row';

                                    UPDATE  dbEstimate
                                    SET     row_size = O.size_with_requested_compression_setting
                                    FROM    dbEstimate D
                                            JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                    DELETE  #ObjEst;
                              END /* end evaluating objects with no compression */

 /*	This section is no longer needed because 7/17/13 change modified this routine to only look
		at data that was not compressed.

							/* evaluate objects with row compression */
                           IF @DAD = 'row'
                              BEGIN
					           /* estimate Page compression */
                                    INSERT  #ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = @Schema
                                               ,@object_name = @object
                                               ,@index_id = @indexID
                                               ,@partition_number = @partNO
                                               ,@data_compression = 'page';

                                    UPDATE  dbEstimate
                                    SET     row_size = O.size_with_current_compression_setting
                                           ,page_size = O.size_with_requested_compression_setting
                                    FROM    dbEstimate D
                                            JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                    DELETE  #ObjEst;

								/* estimate None compression */
                                    INSERT  #ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = @Schema
                                               ,@object_name = @object
                                               ,@index_id = @indexID
                                               ,@partition_number = @partNO
                                               ,@data_compression = 'none';

                                    UPDATE  dbEstimate
                                    SET     none_size = O.size_with_requested_compression_setting
                                    FROM    dbEstimate D
                                            JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                    DELETE  #ObjEst;
                              END /* end evaluating objects with row compression */

							/* evalutate objects with page compression */
                           IF @DAD = 'page'
                              BEGIN
							/* estimate Row compression */
                                    INSERT  #ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = @Schema
                                               ,@object_name = @object
                                               ,@index_id = @indexID
                                               ,@partition_number = @partNO
                                               ,@data_compression = 'row';

                                    UPDATE  dbEstimate
                                    SET     page_size = O.size_with_current_compression_setting
                                           ,row_size = O.size_with_requested_compression_setting
                                    FROM    dbEstimate D
                                            JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                    DELETE  #ObjEst;

								/* estimate None compression */
                                    INSERT  #ObjEst
                                            (object_name
                                            ,schema_name
                                            ,index_id
                                            ,partition_number
                                            ,size_with_current_compression_setting
                                            ,size_with_requested_compression_setting
                                            ,sample_size_with_current_compression_setting
                                            ,sample_size_with_requested_compresison_setting)
                                            EXEC sp_estimate_data_compression_savings
                                                @Schema_name = @Schema
                                               ,@object_name = @object
                                               ,@index_id = @indexID
                                               ,@partition_number = @partNO
                                               ,@data_compression = 'none';

                                    UPDATE  dbEstimate
                                    SET     none_size = O.size_with_requested_compression_setting
                                    FROM    dbEstimate D
                                            JOIN #ObjEst O ON D.Schema_name = O.Schema_Name
                                                              AND D.Object_name = O.object_name
                                                              AND D.index_id = O.index_id
                                                              AND D.partition_number = O.partition_number;

                                    DELETE  #ObjEst;
                              END /* end evaluating objects with page compression */
  */
                           FETCH NEXT FROM cCompress INTO @Schema, @object,
                                 @indexID, @partNO, @DAD
                     END -- end time check block
                  ELSE
                     BEGIN
                           PRINT 'Max runtime reached. No compression performed. Compression estimate still processing. Exiting...';
						   CLOSE cCompress
						   DEALLOCATE cCompress
						   DROP TABLE #ObjEst
                           RETURN
                     END
            END -- end while loop

      CLOSE cCompress
      DEALLOCATE cCompress

      PRINT 'Initial scan complete.'
         --END
  /* End evaluating compression savings. Now do the actual compressing. */

      PRINT 'Beginning compression.';

  /* Do not process objects that do not meet our compression criteria */

      UPDATE    dbEstimate
      SET       AlreadyProcessed = 1
      WHERE     (1 - (CAST(Row_Size AS FLOAT) / None_Size)) < @minCompression
                AND (Row_Size <= Page_Size)
                AND None_Size > 0;

      UPDATE    dbEstimate
      SET       AlreadyProcessed = 1
      WHERE     (1 - (CAST(Page_Size AS FLOAT) / None_Size)) < @minCompression
                AND (Page_Size <= Row_Size)
                AND None_Size > 0;

  /* Do not set compression on empty objects */

      UPDATE    dbEstimate
      SET       AlreadyProcessed = 1
      WHERE     None_size = 0;

  /* Do not process objects that are already using an acceptable compression method */

      UPDATE    dbEstimate
      SET       AlreadyProcessed = 1
      WHERE     (1 - (CAST(Page_Size AS FLOAT) / none_Size)) >= @minCompression
                AND (Page_Size <= Row_Size)
                AND None_Size > 0
                AND data_compression_desc = 'PAGE';

      UPDATE    dbEstimate
      SET       AlreadyProcessed = 1
      WHERE     (1 - (CAST(Row_Size AS FLOAT) / None_Size)) >= @minCompression
                AND (Row_Size <= Page_Size)
                AND None_Size > 0
                AND data_compression_desc = 'ROW';

/* This section no longer needed because 7/17/13 change modified routine to calculate sizes for
		any that weren't calculated yet

	  /* Ignore tables that have not had their sizes calculated. This can occur if the procedure
			execution was cancelled before the initial scan completed. */

      UPDATE    dbEstimate
      SET       AlreadyProcessed = 1
      WHERE     None_Size IS NULL
                OR Row_Size IS NULL
                OR Page_Size IS NULL;
*/
 -- set the compression
      DECLARE cCompress CURSOR FAST_FORWARD
      FOR
              SELECT    schema_name
                       ,object_name
                       ,partition_number
                       ,ixName
                       ,ixType
                       ,CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression
                                  AND (Row_Size <= Page_Size) THEN 'Row'
                             WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression
                                  AND (Page_Size <= row_Size) THEN 'Page'
                             ELSE 'None'
                        END AS Recommended_Compression
                       ,PK
              FROM      dbEstimate
              WHERE     None_Size <> 0
                        AND (CASE WHEN (1 - (CAST(Row_Size AS FLOAT)
                                             / none_Size)) >= @minCompression
                                       AND (Row_Size <= Page_Size) THEN 'Row'
                                  WHEN (1 - (CAST(page_Size AS FLOAT)
                                             / none_Size)) >= @minCompression
                                       AND (Page_Size <= row_Size) THEN 'Page'
                                  ELSE 'None'
                             END <> data_compression_desc)
                        AND AlreadyProcessed = 0
              ORDER BY  None_Size ASC;		/* start with smallest tables first */

      OPEN cCompress

      FETCH cCompress INTO @Schema, @object, @partNO, @ixName, @ixType,
            @Recommended_Compression, @PK  -- prime the cursor;

      WHILE @@Fetch_Status = 0
            BEGIN

                  IF @ixType = 'Clustered'
                     OR @ixType = 'heap'
                     SET @SQL = 'ALTER TABLE ' + @Schema + '.' + @object
                         + ' Rebuild with (data_compression = '
                         + @Recommended_Compression + ', SORT_IN_TEMPDB=ON)';

                  ELSE
                     SET @SQL = 'ALTER INDEX ' + @ixName + ' on ' + @Schema
                         + '.' + @object
                         + ' Rebuild with (data_compression = '
                         + @Recommended_Compression + ',SORT_IN_TEMPDB=ON)';

                  IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes
                     BEGIN
                           IF @ExecuteCompressCommand = 1
                              BEGIN
                                    PRINT 'Compressing table/index: '
                                          + @Schema + '.' + @object;
                                    EXEC sp_executesql
                                        @SQL;

                                    UPDATE  dbEstimate
                                    SET     AlreadyProcessed = 1
                                    WHERE   PK = @PK;

                                    SET @CompressedCount = @CompressedCount
                                        + 1;
                              END
                           ELSE
                              BEGIN
                                    PRINT 'Command execution not enabled. Command is:'
                                          + @SQL;
                              END
                     END
                  ELSE
                     BEGIN
                           PRINT 'Max runtime reached. Some compression performed. Exiting...';
                           BREAK
                     END

                  FETCH cCompress INTO @Schema, @object, @partNO, @ixName,
                        @ixType, @Recommended_Compression, @PK;
            END

      CLOSE cCompress;
      DEALLOCATE cCompress;

      PRINT 'Objects compressed: ' + CONVERT(VARCHAR(10), @CompressedCount);

      DROP TABLE #ObjEst;

      RETURN

One thought on “Enabling Data Compression – Updated

Leave a Reply

Your email address will not be published. Required fields are marked *

I am a real person and can prove it by doing this math problem: (required) Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.