Deleting From Very Large Tables

One of the challenges of my new job is working with a couple very large databases (VLDBs) and very large tables. One database in particular has one table with 61 million rows and another with 78 million rows. They are growing by about 80,000 rows per day. My job is to purge data older than 1 year from these tables. Did I mention these tables support a 24 x 7 financial system? And blocking of more than a few seconds sets off all kinds of alarms? This presents something of a challenge.

Unfortunately, the tables are not partitioned. If they were (and if they were portioned correctly), this task would be simple, fast, and easy. Although I won’t go into details of partitioning here, suffice it to say that deleting or moving all the data in a partition is just a meta-data change that SQL Server can do in seconds without moving any actual rows of data. Going forward, we will need to create partitions on these tables, but for now, I needed to come up with a way to purge vast amounts of data (roughly 2/3 of each table) without causing excessive blocking or transaction log growth.

I was handed a script that was developed by a previous DBA here as a starting point. This script worked, but it was slow and caused lots of blocking. Here is a portion of it:

SET	@DeletePriorToDate = DATEADD(yy,-1,current_timestamp)

/*Get the TOP x PKs to delete */


		SELECT  TOP 1000 user_node_log_id
        FROM    v_t_u_logs WITH (NOLOCK)
        WHERE   create_time <= @DeletePriorToDate

FROM	v_t_u_logs vtul WITH (ROWLOCK)
		INNER JOIN @PKSToDelete ptd
		ON vtul.user_node_log_id =

That removed records from the smaller of the two tables. The next step in the script was to delete records from the larger table. In this case, the smaller table is related to the larger table via a field called node_id. Further, there is a many to many relationship between the smaller table and the larger table, so that you can’t just get a list of the node_ids to delete from the smaller table and delete them all from the larger table. Some might need to be retained. So the script I was given did the following:

CREATE TABLE #NodeIDs (Node_Id NUMERIC(18, 0))
        FROM    dbo.v_t_NODE vtn WITH (NOLOCK)
        LEFT JOIN dbo. v_t_u_logs vtul
                WITH (NOLOCK)
                ON vtn.NODE_ID = vtul.NODE_ID
        WHERE   vtul.NODE_ID IS NULL

CREATE INDEX tmpidxNodeID ON [#NodeIDs] (Node_ID ASC)

FROM     dbo. v_t_n_history v
INNER JOIN [#NodeIDs] nid
        ON v.Node_Id = nid.Node_ID

Kudos for throwing an index on the temporary table to speed things up, but this still didn’t work well. In words, the process worked like this: find the first 1000 records from vtul that are older than 1 year and store their primary key values in a table variable. Then, delete those records from vtul by joining to that table variable. Next, look through vtn and find all the records that do not have a corresponding node_id left in vtul. Store those node_ids in a temp table. Delete records with those node_ids from vtnh.

This approach has one benefit – if the query is canceled at any time (due to blocking issues), subsequent runs will pick up any records that still need to be deleted. This is thanks to the left join on vtul and selecting records where vtul.node_id is null. But this approach has multiple problems. First, the temp table is populated by joining the 61 million row table with vtn, a 3.5 million row table. That’s a lot of work and creates blocking, even with the NOLOCK hint. (Much of the blocking simply came from disk I/O waits, as all those rows needed to be read.) Second, the deletion from the vtnh table, the one with 78 million rows, was also causing blocking. This one was harder to control because there was no way of controlling how many records might be in the temp table. If you had canceled the script part way through several times, the temp table might contain 10,000 or more node_ids that needed to be deleted. Clearly, this process needed to be re-worked.

My first attempt was to try to limit the number of rows being deleted with a TOP X clause and looping. This was marginally successful. Blocking still occurred and to completely eliminate it, I had to drop down to deleting the TOP 5 rows, which is pretty close to RBAR – row by agonizing row. This was surprising, as I felt the TOP clause surely would have been useful here. But in doing some research, I found a post on Robert Davis’ (twitter | blog) blog about archiving and purging from large tables. His post contained this statement: “One more side note: simply adding TOP to the delete statement will NOT minimize the amount of locks taken initially. Please don’t do this!!” Yup. I could confirm that. (I’m guessing SQL Server does the whole join first anyway, and then simply picks the top X to delete, which is why you don’t see a performance benefit.)

Reading the rest of his post showed me I was at least heading in the right direction. I was selecting records to delete based on the primary key and then doing a targeted delete of those records. But I needed to make some modifications to make this script truly useful. My first target for optimization was that join of two big tables to determine which node_ids needed to be kept. This was done to allow the routine to pick up any records that should have been deleted in prior runs, but might not have been due to the query being cancelled. I knew this delete process was a one-time task. Once we got caught up, we were going to implement a scheduled job to keep up  with deleting old records on a daily basis. So I made the decision to not worry about records that might be “orphaned” due to canceling a query. My thinking was to get most of the records deleted first, then, once we are caught up, I could run a query to find all those stragglers that were missed. The tables would be much smaller by that time, so the query should run quicker then. Another purpose of that join was to retain any records that might still have node_ids in the vtn table (which was in a many-to many relation, remember). My solution was to grab my list of primary keys of records to delete and the associated node_ids. I would delete the records identified by the primary keys, then scan the table again to see if any of the node_ids I had grabbed  were still present in the table. These needed to be retained, so I removed those ids from my table of ids to delete.

Finally, I needed some way to control the number of records I was deleting at once in order to both minimize or eliminate blocking and to prevent the transaction log from growing out of control with millions of deletions. I did this by using two TOP statements. The first limits the number of records from vtul to be deleted. The second limits to number of records to delete at a time. I already knew that TOP didn’t really seem to limit locking when joining tables in a DELETE statement, so instead, I inserted my node_ids to delete in a second table variable which would be used to join on for the DELETE. By using the TOP statement to limit the number of rows inserted into this table to a subset of my total records, I would effectively be able to limit the number of records deleted at one time. By looping through the DELETE statement multiple times, I would be able to process all the records selected from vtul. The looping would also help eliminate blocking  by periodically freeing up the table for use by other processes – also known as “trickle delete.” Here is the code I ended up using (which also includes some stuff for reporting):

	-- Delete all records prior to this date (usually 1 year ago)
DECLARE @RowsDeletedInTable INT
	-- Counter variable for reporting
	-- For tracking execution time
	-- For tracking execution time
	-- Number of times to loop through this whole routine
 -- Table containing small set of PK of records to delete
 -- Variable used to load @WorkTable with values

SET @LastPKDeleted = 0
SET @MaxCycleLoops = 40
SET @CycleLoopCounter = 1
WHILE @CycleLoopCounter <= @MaxCycleLoops 

        PRINT '=========================='
        PRINT ' Starting pass ' + CONVERT(VARCHAR(2), @CycleLoopCounter) + ' of ' + CONVERT(VARCHAR(2), @MaxCycleLoops)
        PRINT '=========================='
        SET @StartTime = CURRENT_TIMESTAMP

        SET @DeletePriorToDate = DATEADD(yy, -1, CURRENT_TIMESTAMP)
        SET @RowsDeletedInTable = 0

        DECLARE @PKsToDelete TABLE
             id INT,
             node_id NUMERIC(18, 0) NOT NULL

-- Get PKs to delete from vtul.

        INSERT  INTO @PKsToDelete
                SELECT TOP 5000
                FROM    v_t_u_logs WITH (NOLOCK)
                WHERE   create_time <= @DeletePriorToDate

        PRINT 'Deleting data from v_t_u_LOGS (1/4)..'

	Only delete in small batches to prevent blocking. Note, using the TOP keyword in a delete statement
	does not prevent blocking. So take a subset of the primary keys, put those in their own table
	and delete by joining on that 
        WHILE EXISTS ( SELECT   vtul.*
                       FROM     dbo.v_t_u_logs vtul WITH (NOLOCK)
                                INNER JOIN @PKSToDelete pk2d ON vtul.user_node_log_id = ) 

                INSERT  INTO @WorkTable
                        SELECT TOP 100
                        FROM    @PKsToDelete
                        WHERE   ID > @LastPKDeleted
                        ORDER BY ID ASC

                SELECT  @LastPKDeleted = MAX(id)
                FROM    @WorkTable

                DELETE  vtul
                FROM    v_t_u_logs vtul WITH (ROWLOCK)
                        INNER JOIN @WorkTable wt ON vtul.user_node_log_id =

                SET @RowsDeletedInTable = @RowsDeletedInTable + @@rowcount

                DELETE  FROM @WorkTable

                PRINT 'Total rows deleted in table: ' + CONVERT (VARCHAR(10), @RowsDeletedInTable)
                PRINT 'Deletion completed for v_t_u_LOGS.'


/* Some records with node_ids created more than 1 year ago might still be in other tables due to
	updates. We need to keep these records. So purge any node_ids that need to remain */
        DELETE  FROM @PKsToDelete
        WHERE   node_id IN (SELECT  node_id
                            FROM    dbo.v_t_u_logs WITH (NOLOCK))

        PRINT 'Starting Purging for v_t_n_HISTORY (2/4)..'
        SET @RowsDeletedInTable = 0

-- trickle delete from vtnh to prevent blocking

        SET @LastPKDeleted = 0
        WHILE EXISTS ( SELECT   v.*
                       FROM     dbo.v_t_n_history v WITH (NOLOCK)
                                INNER JOIN @PKSToDelete pk2d ON v.Node_Id = pk2d.Node_ID ) 

                INSERT  INTO @WorkTable
                        SELECT TOP 100
                        FROM    @PKsToDelete
                        WHERE   node_ID > @LastPKDeleted
                        ORDER BY node_ID ASC

                SELECT  @LastPKDeleted = MAX(id)
                FROM    @WorkTable

                DELETE  v
                FROM    dbo.v_t_n_history v WITH (ROWLOCK)
                        INNER JOIN @WorkTable wt ON wt.Id = v.Node_ID

                SET @RowsDeletedInTable = @RowsDeletedInTable + @@rowcount

                DELETE  FROM @WorkTable


        PRINT 'Total rows deleted in table: ' + CONVERT (VARCHAR(10), @RowsDeletedInTable)    
        PRINT 'Purging completed for v_t_n_HISTORY.'

        PRINT 'Process completed.'

        PRINT 'Elapsed time for one delete cycle: ' + CONVERT(VARCHAR(5), DATEDIFF(s, @StartTime, @EndTime))
            + ' seconds.'

        DELETE FROM @PKsToDelete
        SET @CycleLoopCounter = @CycleLoopCounter + 1

(I should note that in actuality, I have to delete from 4 tables altogether, not just the two I have shown here, but the others are handled the same way. They were left out for clarity and this is why some of the messages say 1/4, 2/4, etc.)

One other thing to note. Even though the DELETE statements specify the ROWLOCK hint, this is only a hint and SQL Server will ignore it when it wants to. For example, if a data page contains many rows to be deleted, SQL Server will escalate the row locks into a page lock. This is because each lock takes up memory and resources to manage and SQL Server will opt for a page lock instead of multiple row locks on a page to conserve resources. This is one reason why I was seeing blocking with larger delete batches.

After some experimentation, I discovered that selecting the TOP 5000 records to delete from vtul and then deleting from the other table in batches of 100 seems to work well. This is  a huge improvement over deleting in batches of 5 rows from my first attempt. I’ve added a loop counter so that I can loop the whole script multiple times. I’ve been running this script continuously now for a couple of days and have not encountered any blocking issues. Also, the deletes are happening slow enough that our regular transaction log backups can keep up with the pace and the transaction log does not grow at all. (If this were not the case, I would have to add a transaction log backup as part of the script.)

When working with very large tables, partitioning is clearly the way to go. Unfortunately, DBAs often find themselves dealing with large tables that were not set up with partitions initially and are stuck with purging data the “old-fashioned” way. However, once you have caught up on your purging, the first thing you should do is implement partitioning so that going forward, you won’t have to go through this tedious process again.

4 thoughts on “Deleting From Very Large Tables

  1. I’m looking at your statement and while 61million and 78million rows are big, I’m having to purge a table that has 1.9 Billion rows with no partitioning. The way I’m going thru it w/o all the extra coding is by using set rowcount 1000.

    The other day, I had to deal with a database that maxed out the INT value, had to spend 3 days converting that to BIGINT. Lots of fun.

    Since you are in scottsdale, I was wondering if you are planning to attend the AZ SQL Group meeting in September?

  2. I think I recall some sort of issues using rowcount.. Can’t remember what they were off-hand, but if it works…

    Unfortunately, I’ve got a big server upgrade scheduled for the same day of the AZ SQL Group meeting, so I won’t be able to make it.

  3. Did a quick bit of research.. I don’t think there was any problem with using SET ROWCOUNT – just that it will not work in future versions of SQL:

    “Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it.”

    This was in 2008 R2 BOL, so that would imply this method will not work in Denali. MS recommends using TOP, which, as pointed out in the above post, doesn’t really work to reduce locking.

Leave a Reply

Your email address will not be published. Required fields are marked *

I am a real person and can prove it by doing this math problem: (required) Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.