[UPDATE Oct 2013]: I have made some improvements to this script, including those mentioned in the comments below. See my post on 10/28/13 for the latest version.
I recently posted an update of my script to enable data compression in SQL Server, but I’ve made some significant changes recently and decided it was time to put another version out there.
What prompted these changes? I was going through my quarterly check of all the databases I watch over and realized that I had several large databases that could benefit from data compression. My servers are increasingly becoming virtualized and that means they are increasingly running the Enterprise edition of SQL Server, which supports data compression. I’ve also not encountered any problems with any databases I’ve compressed so far. Based on these factors, I decided to start looking into compressing more databases.
But the current script I had to perform the compression wasn’t really elegant. For starters, the stored procedure and logging table had to be located in the database being compressed. This worked fine when I was just compressing one or two databases, but if I’m going to start compressing more databases, I didn’t want to have to put these items in each database. Instead, I wanted to install this procedure in my DBA utility database, a copy of which I have on each server, and be able to run it from there.
I also wanted to change the way the script behaved a little bit. I wanted the new version to be able to operate on multiple databases.
And finally, I cleaned up the code. The first version of the script would check all data, whether it was compressed or not, to see if a different compression method might produce better results. I had previously decided that would likely be a rare case and it created unacceptably long run times, so I changed the script to only look at uncompressed data. But I had left all the old code in place – I just commented it out.I ripped all that stuff out this time around. I also changed the name of the logging table and the name of the actual stored procedure to be more descriptive of what it was actually for.
So, I present to you the latest iteration of my script for enabling data compression! Usage is:
EXEC [up_CompressDatabase] @minCompression = .25 ,@MaxRunTimeInMinutes = 60 ,@ExecuteCompressCommand = 1 /* 1 to execute command */ ,@DBToScan = '<databasename>' /* database to compress */
@minCompression – the minimum compression ratio that must be reached in order for the data to be compressed. This is a percentage expressed as a decimal – in the example above, the minimum compression is 25%.
@MaxRunTimeInMinutes – the maximum amount of time this procedure will run. This is not a hard and fast limit. It might run slightly longer because the time is only checked after the completion of each command.
@ExecuteCompressCommand – if this is a one, the data will be compressed. If this is a zero, only the compression estimate will be performed and the actual compression commands are printed out instead of being executed.
@DBToScan – this is the name of the database that you want to implement compression in. See Behavior Notes below.
The stored procedure will create a table in the current database named dbCompressionEstimates. This table will contain a list of all compressible objects in the database specified by the @DBToScan parameter, as well as some additional information.
Compression can take a long time, so this procedure is written to be able to be run over multiple days, perhaps for a few hours during your maintenance windows. It will pick up where it left off the last time it ran.
The @DBToScan parameter tells the procedure which database to scan for compressible objects. It logs this information to the dbCompressionEstimates table. However, it does NOT limit the actual compression process to that database! The compression portion of the procedure will work to compress objects in the dbCompressionEstimates table, no matter what database they are in.
This was done to allow for the following use case: Suppose you have two databases you want to compress. You have a small window of time during which compression can be performed, so you want to maximize the time spent compressing data. In this situation, you would “preload” the dbCompressionEstimates table by executing the following two commands:
EXEC [up_CompressDatabase] @minCompression = .25 ,@MaxRunTimeInMinutes = 60 ,@ExecuteCompressCommand = 0 ,@DBToScan = 'database1' GO EXEC [up_CompressDatabase] @minCompression = .25 ,@MaxRunTimeInMinutes = 60 ,@ExecuteCompressCommand = 0 ,@DBToScan = 'database2' GO
By setting the @ExecuteCompressCommand to zero, the first command will load the dbCompressionEstimates table with objects from database1. The second will load the table with objects from database2. These can usually be run during regular business hours as the estimation routine uses a intent shared lock on the objects it is evaluating (which is the same type of lock your basic SELECT statement uses). (Having said that, let me also point out you should test this first if you have a very busy system that performs lots of inserts, updates, or deletes. Those may experience blocking.) Then, during your maintenance window, you can run either of those two commands again with the @ExecuteCompressCommand set to 1 – it doesn’t matter which one because the routine will see both databases have already been scanned so it will skip that part – and it will begin compressing data.
The routine compresses the smallest tables first, so it is possible it will bounce around between databases when compressing. The size sort is done irrespective of the database name.
The last column in the dbCompressionEstimates table, StatusText, contains a status of the current row. The possible values are:
Initial load – this indicates the object has been scanned, but no compression estimate or actual compression has been performed yet.
Compression estimate 50% complete – the estimate for page compression only has been completed.
Compression estimate 100% complete – the estimates for both row compression and page compression have been completed.
Best compression method less than minCompression threshold – neither page compression nor row compression would compress the data more than the specified minimum compression ratio passed in to the procedure.
No data in table to compress – the table contains no data and will not have compression performed on it.
Compressing data… – this object is currently being compressed
Compression complete – the object has been compressed.
The table also contains a field named AlreadyProcessed. When this value is zero, the routine has not yet evaluated the object for compression. If it is a one, the procedure has processed that object, either compressing it or not compressing it, based on the reason given in the StatusText field.
If the routine has completely processed a database and you run it again, passing in the same database name as a parameter, it will delete all records pertaining to that database from the dbCompressionEstimates table and rescan the database for uncompressed objects. This is useful if your database has new tables or indexes frequently added to it.
And as always, note that compression is a logged operation, so be sure to monitor your log file usage while this is running.
Here is the code. As with all scripts found on the internet, examine carefully before running on your systems.
I think I’m pretty well pleased with this version. Let me know if you have suggestions for possible improvements.
CREATE PROC [dbo].[up_CompressDatabase] ( @minCompression FLOAT = .25 /* e.g. .25 for minimum of 25% compression */ ,@MaxRunTimeInMinutes INT = 60 ,@ExecuteCompressCommand BIT = 0 /* 1 to execute command */ ,@DBToScan SYSNAME /* database to compress */ ) AS SET nocount ON; /* Original script by Paul Nielsen www.SQLServerBible.com March 13, 2008 Modified by Shaun J. Stuart www.shaunjstuart.com February 27, 2013 Sets compression for all objects and indexes in the database needing adjustment If estimated gain is equal to or greater than min compression parameter then enables row or page compression, whichever gives greater space savings If row and page have same gain then, enables row compression If estimated gain is less than min compression parameter, then compression is set to none - SJS 2/27/13 - Added @MaxRunTimeInMinutes to limit run length (checked afer each command, so may run longer) Note: This timer only applies to the actual compression process. It does not limit the time it takes to perform a scan of all the tables / indexes and estimate compression savings. - Sorted compression cursor to compress smallest tables first - Added flag to execute compression command or not - Added sort in tempdb option (always) - Re-wrote slightly to persist initial scan results. Will only re-scan after all tables have been processed - SJS 7/17/13 - Modified to only look at data that is not already compressed - Modified for that any items with null none_size, page_size, or row_size are no longer set as AlreadyProcessed and now have their sizes calculated (i.e. if process was cancelled before initial scan was completed) - Modified so that time limit now applies to compression estimate process as well as actual compression process - Only look at partitions that have at least one row - SJS 8/6/13 - Changed name of dbEstimate table to dbCompressionEstimates for clarity - Added StatusText column to dbCompressionEstimates and added status messages - Modified to take database name as input parameter to allow use in utility database instead of the DB being compressed (Table dbCompressionEstimates, which stores sizes and compression estimates, is still created in the current database.) - Note: Compression estimates are only made for the database supplied as an input parameter. However, that database name is stored in the dbCompressionEstimates table and will be read from there and used when actually performing the compression. This allows you to create estimates only for multiple databases (using @ExecuteCompressCommand=0), then perform the compression across multiple databases later (with @ExecuteCompressCommand=1). - Removed all references to code that checked already compressed data since this routine now only looks at uncompressed data. */ IF ISNULL(@DBToScan, '') NOT IN (SELECT [name] FROM sys.databases) BEGIN SELECT 'Database ' + ISNULL(@DBToScan, 'NULL') + ' not found on server.' RETURN END DECLARE @CompressedCount INT; SET @CompressedCount = 0; DECLARE @SQL NVARCHAR(MAX); DECLARE @StartTime DATETIME2; SET @StartTime = CURRENT_TIMESTAMP; DECLARE @CurrentDatabase SYSNAME; SET @CurrentDatabase = DB_NAME() IF OBJECT_ID('tempdb..##ObjEst', 'U') IS NOT NULL DROP TABLE ##ObjEst CREATE TABLE ##ObjEst ( PK INT IDENTITY NOT NULL PRIMARY KEY ,object_name VARCHAR(250) ,schema_name VARCHAR(250) ,index_id INT ,partition_number INT ,size_with_current_compression_setting BIGINT ,size_with_requested_compression_setting BIGINT ,sample_size_with_current_compression_setting BIGINT ,sample_size_with_requested_compresison_setting BIGINT ); IF NOT EXISTS ( SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dbCompressionEstimates]') AND type IN (N'U') ) BEGIN CREATE TABLE dbo.dbCompressionEstimates ( PK INT IDENTITY NOT NULL PRIMARY KEY ,DatabaseName SYSNAME ,schema_name VARCHAR(250) ,object_name VARCHAR(250) ,index_id INT ,ixName VARCHAR(255) ,ixType VARCHAR(50) ,partition_number INT ,data_compression_desc VARCHAR(50) ,None_Size INT ,Row_Size INT ,Page_Size INT ,AlreadyProcessed BIT ,StatusText VARCHAR(75) ); END /* If all objects have been processed, rescan and start fresh. Useful for catching added objects since last scan. But beware - this is I/O intensive and can take a while. */ IF NOT EXISTS ( SELECT 1 FROM dbo.dbCompressionEstimates WHERE AlreadyProcessed = 0 AND DatabaseName = @DBToScan ) BEGIN DELETE FROM dbo.dbCompressionEstimates WHERE DatabaseName = @DBToScan; SET @SQL = 'USE [' + @DBToScan + ']; INSERT INTO [' + @CurrentDatabase + '].dbo.dbCompressionEstimates (DatabaseName ,schema_name ,object_name ,index_id ,ixName ,ixType ,partition_number ,data_compression_desc ,AlreadyProcessed ,StatusText) SELECT ''' + @DBToScan + ''' ,S.name ,o.name ,I.index_id ,I.name ,I.type_desc ,P.partition_number ,P.data_compression_desc ,0 AS AlreadyProcessed ,''Initial load'' AS StatusText FROM [' + @DBToScan + '].sys.schemas AS S JOIN [' + @DBToScan + '].sys.objects AS O ON S.schema_id = O.schema_id JOIN [' + @DBToScan + '].sys.indexes AS I ON o.object_id = I.object_id JOIN [' + @DBToScan + '].sys.partitions AS P ON I.object_id = P.object_id AND I.index_id = p.index_id WHERE O.TYPE = ''U'' AND P.data_compression_desc = ''NONE'' AND P.rows > 0;' -- only look at objects with data EXEC (@SQL) END -- Determine Compression Estimates DECLARE @PK INT ,@DatabaseName SYSNAME ,@Schema VARCHAR(150) ,@object VARCHAR(150) ,@DAD VARCHAR(25) ,@partNO VARCHAR(3) ,@indexID VARCHAR(3) ,@ixName VARCHAR(250) ,@ixType VARCHAR(50) ,@Recommended_Compression VARCHAR(10); DECLARE cCompress CURSOR FAST_FORWARD FOR SELECT schema_name ,object_name ,index_id ,partition_number ,data_compression_desc FROM dbo.dbCompressionEstimates WHERE None_size IS NULL OR Row_Size IS NULL OR Page_size IS NULL; OPEN cCompress; FETCH cCompress INTO @Schema, @object, @indexID, @partNO, @DAD; WHILE @@Fetch_Status = 0 BEGIN /* evaluate objects with no compression */ IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes BEGIN IF @DAD = 'none' BEGIN /* estimate Page compression */ SET @SQL = 'USE [' + @DBToScan + ']; INSERT ##ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = ' + @Schema + ' ,@object_name = ' + @object + ' ,@index_id = ' + @indexID + ' ,@partition_number = ' + @partNO + ' ,@data_compression = ''page'';' EXEC (@SQL) UPDATE dbo.dbCompressionEstimates SET none_size = O.size_with_current_compression_setting ,page_size = O.size_with_requested_compression_setting ,StatusText = 'Compression estimate 50% complete' FROM dbo.dbCompressionEstimates D JOIN ##ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE ##ObjEst; -- estimate Row compression SET @SQL = 'USE [' + @DBToScan + ']; INSERT ##ObjEst (object_name ,schema_name ,index_id ,partition_number ,size_with_current_compression_setting ,size_with_requested_compression_setting ,sample_size_with_current_compression_setting ,sample_size_with_requested_compresison_setting) EXEC sp_estimate_data_compression_savings @Schema_name = ' + @Schema + ' ,@object_name = ' + @object + ' ,@index_id = ' + @indexID + ' ,@partition_number = ' + @partNO + ' ,@data_compression = ''ROW'';' EXEC (@SQL) UPDATE dbo.dbCompressionEstimates SET row_size = O.size_with_requested_compression_setting ,StatusText = 'Compression estimate 100% complete' FROM dbo.dbCompressionEstimates D JOIN ##ObjEst O ON D.Schema_name = O.Schema_Name AND D.Object_name = O.object_name AND D.index_id = O.index_id AND D.partition_number = O.partition_number; DELETE ##ObjEst; END /* end evaluating objects with no compression */ FETCH NEXT FROM cCompress INTO @Schema, @object, @indexID, @partNO, @DAD END -- end time check block ELSE BEGIN PRINT 'Max runtime reached. No compression performed. Compression estimate still processing. Exiting...'; CLOSE cCompress DEALLOCATE cCompress DROP TABLE ##ObjEst RETURN END END -- end while loop CLOSE cCompress DEALLOCATE cCompress PRINT 'Initial scan complete.' --END /* End evaluating compression savings. Now do the actual compressing. */ PRINT 'Beginning compression.'; /* Do not process objects that do not meet our compression criteria */ UPDATE dbo.dbCompressionEstimates SET AlreadyProcessed = 1 ,StatusText = 'Best compression method less than minCompression threshold' WHERE (1 - (CAST(Row_Size AS FLOAT) / None_Size)) < @minCompression AND (Row_Size <= Page_Size) AND None_Size > 0; UPDATE dbo.dbCompressionEstimates SET AlreadyProcessed = 1 ,StatusText = 'Best compression method less than minCompression threshold' WHERE (1 - (CAST(Page_Size AS FLOAT) / None_Size)) < @minCompression AND (Page_Size <= Row_Size) AND None_Size > 0; /* Do not set compression on empty objects */ UPDATE dbo.dbCompressionEstimates SET AlreadyProcessed = 1 ,StatusText = 'No data in table to compress' WHERE None_size = 0; -- set the compression DECLARE cCompress CURSOR FAST_FORWARD FOR SELECT DatabaseName ,schema_name ,object_name ,partition_number ,ixName ,ixType ,CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression AND (Row_Size <= Page_Size) THEN 'Row' WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression AND (Page_Size <= row_Size) THEN 'Page' ELSE 'None' END AS Recommended_Compression ,PK FROM dbo.dbCompressionEstimates WHERE None_Size <> 0 AND (CASE WHEN (1 - (CAST(Row_Size AS FLOAT) / none_Size)) >= @minCompression AND (Row_Size <= Page_Size) THEN 'Row' WHEN (1 - (CAST(page_Size AS FLOAT) / none_Size)) >= @minCompression AND (Page_Size <= row_Size) THEN 'Page' ELSE 'None' END <> data_compression_desc) AND AlreadyProcessed = 0 ORDER BY None_Size ASC; /* start with smallest tables first */ OPEN cCompress FETCH cCompress INTO @DatabaseName, @Schema, @object, @partNO, @ixName, @ixType, @Recommended_Compression, @PK -- prime the cursor; WHILE @@Fetch_Status = 0 BEGIN IF @ixType = 'Clustered' OR @ixType = 'heap' SET @SQL = 'USE [' + @DatabaseName + ']; ALTER TABLE ' + @Schema + '.' + @object + ' Rebuild with (data_compression = ' + @Recommended_Compression + ', SORT_IN_TEMPDB=ON)'; ELSE SET @SQL = 'USE [' + @DatabaseName + ']; ALTER INDEX ' + @ixName + ' on ' + @Schema + '.' + @object + ' Rebuild with (data_compression = ' + @Recommended_Compression + ',SORT_IN_TEMPDB=ON)'; IF DATEDIFF(mi, @StartTime, CURRENT_TIMESTAMP) < @MaxRunTimeInMinutes BEGIN IF @ExecuteCompressCommand = 1 BEGIN UPDATE dbo.dbCompressionEstimates SET StatusText = 'Compressing data...' WHERE PK = @PK; PRINT 'Compressing table/index: ' + @Schema + '.' + @object + ' in database ' + @DatabaseName; EXEC sp_executesql @SQL; UPDATE dbo.dbCompressionEstimates SET AlreadyProcessed = 1 ,StatusText = 'Compression complete' WHERE PK = @PK; SET @CompressedCount = @CompressedCount + 1; END ELSE BEGIN PRINT 'Command execution not enabled. Command is:' + @SQL; END END ELSE BEGIN PRINT 'Max runtime reached. Some compression performed. Exiting...'; CLOSE cCompress DEALLOCATE cCompress DROP TABLE ##ObjEst BREAK END FETCH cCompress INTO @DatabaseName, @Schema, @object, @partNO, @ixName, @ixType, @Recommended_Compression, @PK; END CLOSE cCompress; DEALLOCATE cCompress; PRINT 'Objects compressed: ' + CONVERT(VARCHAR(10), @CompressedCount); DROP TABLE ##ObjEst RETURN
10 thoughts on “Enabling Data Compression – August 2013 Update”
Thanks Shaun ! Using your tool as we speak. Here’s a snippet of what I just sent management on this: “Once compressed, no disk space saving would occur until database files were shrunk. This introduces fragmentation so index rebuilding would be in order ( although I’m not sure about the fragmentation effect on a netapp )
We already discussed the fact that with the types of queries we run with multiple joins, it isn’t expected that compression would help query times.
Lastly, unless Production is compressed, every restore to QA/DEV from prod would require doing this again.”
Seems like a table named “file” caused a issue. Made a change here to see if the square brackets help
DECLARE cCompress CURSOR FAST_FORWARD
, ‘[‘ + object_name + ‘]’ as object_name
Yeah, that should do it. I also ran into a similar issue with an index name that had a dash in it. Had to do the same thing with the two ALTER statements and put brackets around the name there.
You should check if some tables are already compress, right? Your script advise me to compress tables which are already compressed!
BTW a huge thanks for your script! I knew Paul Nielsen script’s, but yours is pretty much awesome!! ;)
Guillaume – Are you sure about that? Line 165 explicity states to include only tables that have no compression. When I run it, it only selects tables that have no compression. If you clear out the dbCompressionEstimates table and run it again, does it still picked up already compressed tables?
FYI, I’ve got an updated version of this coming out in a couple weeks with some fixes. There was a bug in the cursor defined at line 185, but I don’t think that would cause selecting tables that are already compressed.
Nice feature edwin. Please also tell that what are impacts of data compression on mirroring and replication.