Back in March, I wrote about a method I use to enable data compression on a database and periodically
re-scan for new tables to compress. This method worked fine on the database I was working with at the time. However, I have since tried to run this on a much larger database and the performance was less than optimal.
The database I am now working with has 200,000+ tables. The old script was taking over 2 days to iterate through all these tables and get the estimated compression values. This posed a couple of problems. First, because I want to run this on a weekly basis to catch any new tables that might be added, that length of time was not acceptable. Second, my initial script featured a parameter to limit how long the script ran, but it did not apply to the initial table scan and compression estimate section. Third, if the script was canceled while the compression estimates were still being performed, a re-run of the script would not pick up where it left off – it would simply ignore the tables that had no estimates and start compressing those that did have estimates.
As I’ve mentioned previously, creating maintenance scripts is an iterative process. So I made a couple changes:
- Time limit now applies to entire process. Upon exit, it will print a message saying how many tables were compressed or, if it did not reach the compression stage, that it is still generating compression estimates for the tables.
- Will resume generating compression estimates for tables if there are some that still need them generated.
- Will only look at tables or indexes that have at least one row of data.
- Will only look at tables or indexes that are currently not compressed.
One note about item 4 – the original version of the script evaluated all tables or indexes, even if they were already compressed. The thinking was that it was possible that an object was initially compressed with one method (page or row compression) and now it might do better using the other method. I figure this is a fairly rare case and, if it did happen, the difference between methods would probably be minimal. So in the interest of speeding things up, the script now only looks at non-compressed objects. I did however, leave the old code in place and just commented it out, in case I want to put it back in later.
Note that in order to use this, you will also need to code to create the db_compression_estimate stored procedure, which can be found here.
CREATE PROC [dbo].[db_compression] ( @minCompression FLOAT = .25 /* e.g. .25 for minimum of 25% compression */ ,@MaxRunTimeInMinutes INT = 60 ,@ExecuteCompressCommand BIT = 0 /* 1 to execute command */ ) AS SET nocount ON; /* Original script by Paul Nielsen www.SQLServerBible.com March 13, 2008 Modified by Shaun J. Stuart www.shaunjstuart.com February 27, 2013 Sets compression for all objects and indexes in the database needing adjustment If estimated gain is equal to or greater than min compression parameter then enables row or page compression, whichever gives greater space savings If row and page have same gain then, enables row compression If estimated gain is less than min compression parameter, then compression is set to none - SJS 2/27/13 - Added @MaxRunTimeInMinutes to limit run length (checked afer each command, so may run longer) Note: This timer only applies to the actual compression process. It does not limit the time it takes to perform a scan of all the tables / indexes and estimate compression savings. - Sorted compression cursor to compress smallest tables first - Added flag to execute compression command or not - Added sort in tempdb option (always) - Re-wrote slightly to persist initial scan results. Will only re-scan after all tables have been processed - SJS 7/17/13 - Modified to only look at data that is not already compressed - Modified for that any items with null none_size, page_size, or row_size are no longer set as AlreadyProcessed and now have their sizes calculated (i.e. if process was cancelled before initial scan was completed) - Modified so that time limit now applies to compression estimate process as well as actual compression process - Only look at partitions that have at least one row */ DECLARE @CompressedCount INT; SET @CompressedCount = 0; DECLARE @StartTime DATETIME2; SET @StartTime = CURRENT_TIMESTAMP; CREATE TABLE #ObjEst ( PK INT IDENTITY NOT NULL PRIMARY KEY ,object_name VARCHAR(250) ,schema_name VARCHAR(250) ,index_id INT ,partition_number INT ,size_with_current_compression_setting BIGINT ,size_with_requested_compression_setting BIGINT ,sample_size_with_current_compression_setting BIGINT ,sample_size_with_requested_compresison_setting BIGINT ); IF NOT EXISTS ( SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dbEstimate]') AND type IN (N'U') ) BEGIN CREATE TABLE dbEstimate ( PK INT IDENTITY NOT NULL PRIMARY KEY ,schema_name VARCHAR(250) ,object_name VARCHAR(250) ,index_id INT ,ixName VARCHAR(255) ,ixType VARCHAR(50) ,partition_number INT ,data_compression_desc VARCHAR(50) ,None_Size INT ,Row_Size INT ,Page_Size INT ,AlreadyProcessed BIT ); END /* If all objects have been processed, rescan and start fresh. Useful for catching added objects since last scan. But beware - this is I/O intensive and can take a while. */ IF NOT EXISTS ( SELECT 1 FROM dbEstimate WHERE AlreadyProcessed = 0 ) BEGIN DELETE FROM dbEstimate; INSERT INTO dbEstimate (schema_name ,object_name ,index_id ,ixName ,ixType ,partition_number ,data_compression_desc ,AlreadyProcessed) SELECT S.name ,o.name ,I.index_id ,I.name ,I.type_desc ,P.partition_number ,P.data_compression_desc ,0 AS AlreadyProcessed FROM sys.schemas AS S JOIN sys.objects AS O ON S.schema_id = O.schema_id JOIN sys.indexes AS I ON o.object_id = I.object_id JOIN sys.partitions AS P ON I.object_id = P.object_id AND I.index_id = p.index_id WHERE O.TYPE = 'U' AND P.data_compression_desc = 'NONE' AND P.rows > 0; -- only look at objects with data END -- Determine Compression Estimates DECLARE @PK INT ,@Schema VARCHAR(150) ,@object VARCHAR(150) ,@DAD VARCHAR(25) ,@partNO INT ,@indexID INT ,@ixName VARCHAR(250) ,@SQL NVARCHAR(MAX) ,@ixType VARCHAR(50) ,@Recommended_Compression VARCHAR(10); DECLARE cCompress CURSOR FAST_FORWARD FOR SELECT schema_name ,object_name ,index_id ,partition_number ,data_compression_desc FROM dbEstimate WHERE None_size IS NULL OR Row_Size IS NULL OR Page_size IS NULL; OPEN cCompress; FETCH cCompress INTO @Schema, @object, @indexID, @partNO, @DAD; WHILE @@Fetch_Status = 0 BEGIN /* evaluate objects with no compression */ IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes BEGIN IF @DAD = 'none' BEGIN /* estimate Page compression */ INSERT #ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = @Schema ,@object_name = @object ,@index_id = @indexID ,@partition_number = @partNO ,@data_compression = 'page'; UPDATE dbEstimate SET none_size = O.size_with_current_compression_setting ,page_size = O.size_with_requested_compression_setting FROM dbEstimate D JOIN #ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE #ObjEst; -- estimate Row compression INSERT #ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = @Schema ,@object_name = @object ,@index_id = @indexID ,@partition_number = @partNO ,@data_compression = 'row'; UPDATE dbEstimate SET row_size = O.size_with_requested_compression_setting FROM dbEstimate D JOIN #ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE #ObjEst; END /* end evaluating objects with no compression */ /* This section is no longer needed because 7/17/13 change modified this routine to only look at data that was not compressed. /* evaluate objects with row compression */ IF @DAD = 'row' BEGIN /* estimate Page compression */ INSERT #ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = @Schema ,@object_name = @object ,@index_id = @indexID ,@partition_number = @partNO ,@data_compression = 'page'; UPDATE dbEstimate SET row_size = O.size_with_current_compression_setting ,page_size = O.size_with_requested_compression_setting FROM dbEstimate D JOIN #ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE #ObjEst; /* estimate None compression */ INSERT #ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = @Schema ,@object_name = @object ,@index_id = @indexID ,@partition_number = @partNO ,@data_compression = 'none'; UPDATE dbEstimate SET none_size = O.size_with_requested_compression_setting FROM dbEstimate D JOIN #ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE #ObjEst; END /* end evaluating objects with row compression */ /* evalutate objects with page compression */ IF @DAD = 'page' BEGIN /* estimate Row compression */ INSERT #ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = @Schema ,@object_name = @object ,@index_id = @indexID ,@partition_number = @partNO ,@data_compression = 'row'; UPDATE dbEstimate SET page_size = O.size_with_current_compression_setting ,row_size = O.size_with_requested_compression_setting FROM dbEstimate D JOIN #ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE #ObjEst; /* estimate None compression */ INSERT #ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = @Schema ,@object_name = @object ,@index_id = @indexID ,@partition_number = @partNO ,@data_compression = 'none'; UPDATE dbEstimate SET none_size = O.size_with_requested_compression_setting FROM dbEstimate D JOIN #ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE #ObjEst; END /* end evaluating objects with page compression */ */ FETCH NEXT FROM cCompress INTO @Schema, @object, @indexID, @partNO, @DAD END -- end time check block ELSE BEGIN PRINT 'Max runtime reached. No compression performed. Compression estimate still processing. Exiting...'; CLOSE cCompress DEALLOCATE cCompress DROP TABLE #ObjEst RETURN END END -- end while loop CLOSE cCompress DEALLOCATE cCompress PRINT 'Initial scan complete.' --END /* End evaluating compression savings. Now do the actual compressing. */ PRINT 'Beginning compression.'; /* Do not process objects that do not meet our compression criteria */ UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE (1 - (CAST(Row_Size AS FLOAT) / None_Size)) < @minCompression AND (Row_Size <= Page_Size) AND None_Size > 0; UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE (1 - (CAST(Page_Size AS FLOAT) / None_Size)) < @minCompression AND (Page_Size <= Row_Size) AND None_Size > 0; /* Do not set compression on empty objects */ UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE None_size = 0; /* Do not process objects that are already using an acceptable compression method */ UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE (1 - (CAST(Page_Size AS FLOAT) / none_Size)) >= @minCompression AND (Page_Size <= Row_Size) AND None_Size > 0 AND data_compression_desc = 'PAGE'; UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE (1 - (CAST(Row_Size AS FLOAT) / None_Size)) >= @minCompression AND (Row_Size <= Page_Size) AND None_Size > 0 AND data_compression_desc = 'ROW'; /* This section no longer needed because 7/17/13 change modified routine to calculate sizes for any that weren't calculated yet /* Ignore tables that have not had their sizes calculated. This can occur if the procedure execution was cancelled before the initial scan completed. */ UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE None_Size IS NULL OR Row_Size IS NULL OR Page_Size IS NULL; */ -- set the compression DECLARE cCompress CURSOR FAST_FORWARD FOR SELECT schema_name ,object_name ,partition_number ,ixName ,ixType ,CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression AND (Row_Size <= Page_Size) THEN 'Row' WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression AND (Page_Size <= row_Size) THEN 'Page' ELSE 'None' END AS Recommended_Compression ,PK FROM dbEstimate WHERE None_Size <> 0 AND (CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression AND (Row_Size <= Page_Size) THEN 'Row' WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression AND (Page_Size <= row_Size) THEN 'Page' ELSE 'None' END <> data_compression_desc) AND AlreadyProcessed = 0 ORDER BY None_Size ASC; /* start with smallest tables first */ OPEN cCompress FETCH cCompress INTO @Schema, @object, @partNO, @ixName, @ixType, @Recommended_Compression, @PK -- prime the cursor; WHILE @@Fetch_Status = 0 BEGIN IF @ixType = 'Clustered' OR @ixType = 'heap' SET @SQL = 'ALTER TABLE ' + @Schema + '.' + @object + ' Rebuild with (data_compression = ' + @Recommended_Compression + ', SORT_IN_TEMPDB=ON)'; ELSE SET @SQL = 'ALTER INDEX ' + @ixName + ' on ' + @Schema + '.' + @object + ' Rebuild with (data_compression = ' + @Recommended_Compression + ',SORT_IN_TEMPDB=ON)'; IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes BEGIN IF @ExecuteCompressCommand = 1 BEGIN PRINT 'Compressing table/index: ' + @Schema + '.' + @object; EXEC sp_executesql @SQL; UPDATE dbEstimate SET AlreadyProcessed = 1 WHERE PK = @PK; SET @CompressedCount = @CompressedCount + 1; END ELSE BEGIN PRINT 'Command execution not enabled. Command is:' + @SQL; END END ELSE BEGIN PRINT 'Max runtime reached. Some compression performed. Exiting...'; BREAK END FETCH cCompress INTO @Schema, @object, @partNO, @ixName, @ixType, @Recommended_Compression, @PK; END CLOSE cCompress; DEALLOCATE cCompress; PRINT 'Objects compressed: ' + CONVERT(VARCHAR(10), @CompressedCount); DROP TABLE #ObjEst; RETURN
One thought on “Enabling Data Compression – Updated”