I don’t spend a lot of my time tuning queries. Many of the databases in my company are used by third party products and we have limited control over the changes we can make to those. However, we do have several databases that are used by home-grown applications and these we have much more latitude to modify.
Last week, my monitoring software started sending me alerts that a job was blocking other processes. This went on for days, with the length of blocking increasing each day. By Friday, the job was blocking for 2.5 hours. I decided to take a look.
My alert identified the statement that was causing the blocking and it was a stored procedure. I opened it up and started looking at the code. The first 100 lines or so weren’t too bad. I found one query that was non-SARGable. The WHERE clause was
WHERE datediff(day, h.Transaction_Date, getdate()) < 180 )
I changed to the equivalent and SARGable
WHERE h.Transaction_date <DATEADD(d,180,GETDATE()))
But that couldn’t have been what was causing the procedure to run for so long, especially considering the table it reference only had about 10,000 rows. I kept reading and got to the final statement in the procedure. As soon as I saw it, I knew I had found my culprit:
SELECT DISTINCT MemberNumber ,Account_Number ,KeyDate = dbo.fnFormatDate(GETUTCDATE(), 'M/D/YYYY') ,Branch_Name ,Employee_Name ,Transaction_Date ,Transaction_Description ,Call_Type ,Engagement_Number ,Engagement_Type ,Address_Name ,Supervisor ,Call_Duration FROM dbo.Member_Data WHERE MemberNumber + Account_Number + dbo.fnFormatDate(GETDATE(), 'M/D/YYYY') NOT IN (SELECT MemberNumber + Account_Number + dbo.fnFormatDate(GETDATE(), 'M/D/YYYY') FROM dbo.Member_Survey_Results)
I didn’t even need to look at existing indexes on the table to see if this would be a problem. There are no indexes in the world that would help this query. The WHERE clause is concatenating two columns and then appending the current date, which SQL must first format by calling a user defined function! Yikes!
So I started looking into this to find out how I could make it better. The first thing I realized was that appending the current date in the WHERE clause doesn’t do anything. It’s adding identical characters to the end of the values being compared. These will always match, so we can toss that out. Not only will that make what we are comparing shorter, but it will also eliminate the call to the user defined function, which would need to happen for each row in each table.
Looking at the remaining terms, we can see the query is looking for all records in the Member_Data table that do not exist in the Member_Survey_Results table and the comparison is being done using the MemberNumber and Account_Number columns. This same selection can be accomplished by using a left outer join. With a left out join, records not present in the right hand table, will be returned with NULL as a value for the columns for that table. So to find the records that are not present in the right hand table, we just need to filter for one (or more) columns that are NULL from that table. (Be careful here. If your right hand table has columns where NULL is a valid value, you would not want to use those columns in your WHERE clause.)
We can re-write this query as follows:
SELECT DISTINCT mmd.MemberNumber ,mmd.Account_Number ,KeyDate = dbo.fnFormatDate(GETUTCDATE(), 'M/D/YYYY') ,mmd.Branch_Name ,mmd.Employee_Name ,mmd.Transaction_Date ,mmd.Transaction_Description ,mmd.Call_Type ,mmd.Engagement_Number ,mmd.Engagement_Type ,mmd.Address_Name ,mmd.Supervisor ,mmd.Call_Duration FROM dbo.Member_Data mmd LEFT OUTER JOIN dbo.Member_Survey_Results mtsvr ON mmd.MemberNumber = mtsvr.MemberNumber AND mmd.account_Number = mtsvr.Account_Number WHERE mtsvr.Account_number IS NULL
This is a much easier query to SQL Server to optimize. The WHERE clause is SARGable and we are performing a simple join on two columns. Just to make sure things are as easy as possible for SQL, I verified the MemberNumber and Account_Number columns in both tables had indexes.
I ran both queries and verified they returned the same records. The performance improvement I saw with these changes was astounding. The original query took 1.5 hours to run, had a cost of 55.59, and required 878,224 logical reads. By contrast, the re-written version ran in less than 1 second, had a cost of 0.74, and required just 441 logical reads. That’s almost a 2000 times improvement! Amazing! And what’s even more amazing is one table only has 450 rows and the other has just under 20,000 rows. Clearly, as those tables grow, the performance of the original query would have degraded very quickly.
You really travel all overseas so you can tune queries while sitting in a right-hand drive car?