I am now officially a published author! Well, I suppose if you count blog posts, I already was, but today, my first ever non-blog article has been published! I even got paid for it and everything! You can check it out at SQL Server Central here. There is also a discussion thread about it on their forums and that can be found here. The article presents a routine I developed to monitor the disk space usage of my SQL Servers. I know, it’s a topic that has been almost done to death. But most of the routines and scripts I found either used undocumented commands or commands that have been depreciated. In either case, those commands could be removed in future versions of SQL Server, breaking the routine. I wanted something that did not face those problems. They also tended only to focus on a single server, whereas my routine handles all servers on a network.
Please take a moment and read the article or the rest of this post won’t make much sense.
Done? Good. Moving right along…
I’ve implemented this at my work and it’s been running for a couple of months now. I run the routine daily and have used SSRS to create a report that gets emailed to me each morning. The report shows two pieces of information – a 6 month view of the trending disk usage of all our SQL Servers and a point in time view of the percentage of disk space the SQL Server files are using on the server’s disk drives.
Note I didn’t say “show how full the drive is.” My routine, as written will not give that data. The only two pieces of information it collects are the disk space used by the files SQL Server uses and the total size of the drive. Several of my servers are using local drives (not ideal, I know, but it is what it is). Therefore, something else could conceivably come along and use up all the free space on the drive and you’d never know it from looking at this report. I will admit that is a drawback.
Here’s a portion of my report: (Click for a larger view.)
Here, you can see I’m tracking the disk usage of twelve servers. Now, the routine I wrote actually gathers data for all drives used by SQL Server databases. Typically, I like to have at least three drives per server – one for data, one for logs, and one for tempdb. Multiply that by 12 servers and you’ve got a lot of lines making for a really messy chart. So what I did was simply sum up the space used totals for each server. This gives me a single number for each server and makes for a cleaner chart. The number itself is rather meaningless now, since it represents total bytes used spread across various drives, but all I really want from this chart is the usage trend for a server, so it works fine. The line will still trend upwards if a server suddenly starts using more disk space. The chart won’t tell me if that extra space is being used by data files or logfiles or tempdb, but that’s fine. It will alert me to dig deeper. (And my routine captures individual drive and database data, so I will have that information to drill into anyway.) That’s all I want from this chart.
(I’ll get to those little bumps on the gray line in a minute. The gap in some lines is due to a day when the job failed once time partway through and I forgot to re-run it.)
The second half of my report contains three gauges for each server. Here is a sample for two servers:
Here, I split out the various drives for each server. Rather than label them with individual drives letters, which vary from server to server, I’ve used the generic “Data”, “Log”, and “TempDB” labels to let me know which drive the gauge refers to. Remember when I said not all my servers have three drives? I use the needle color to indicate different drives. So in the top row, the three gauges all have different needle colors, indicating this server has three distinct drives. In the second row, all the gauge needles are the same color, indicating everything is stored on one drive on this server. I have also used color to consistently refer to a particular drive: blue always represents the data drive, purple the log drive, and yellow the tempdb drive. So if, for example, a server has two drives, one for data, and one for the log and tempdb files, the needle colors would be blue, purple, purple.
My previous caveat bears repeating here: these gauges do not show the percent of drive space used. Rather, they only show the percent of total drive space used by the SQL Server data files! This is an important distinction. If another program shares the drive, it could use up all the available free space and this report would never show that. Furthermore, these gauges only show the space usage at the point in time that the data was gathered. In my case, I gather the data and run the report once a day, so the gauges show me the space usage as of 5 AM the morning of the report.
As with the line chart, I am not looking for the gauges to show me exact figures. What I want to watch for is any of the needles going into the red zone. That tells me I need to do some more investigating because something is causing some SQL files to grow.
The report isn’t a perfect monitoring tool, I will admit. In fact, there are many software packages for sale out there that can do a much better job. But for a homegrown DBA tool, I think this routine and the report based on it provide a nice overall view of the disk space usage of the SQL Servers I am responsible for.
Oh, about those bumps in the line chart – I noticed they occur every weekend, which also happens to be the time I have my major index maintenance job scheduled. But the fact that only one server was behaving like this was a bit strange. The server in question is a development server that is not heavily used, so I felt pretty confident that whatever was causing the bumps was due to my weekly maintenance and not some user operation. I did some investigating and discovered that five of the databases on that server had auto shrink enabled. Boo! This was a developer box and they probably either didn’t know better or didn’t care about performance on this server. More likely, both. I turned off auto shrink on the databases and the bumps went away. How could auto shrink, which supposedly makes files smaller, cause an increase in disk space usage? I’m glad you asked!
I went into the table containing the data my routine collected and looked at only the server I was interested in. I then looked for something out of the ordinary. (Recall that my line chart doesn’t differentiate between data, log, and temp drives, so I had to look at all the data.) Since I was looking for something that made a pretty big jump in the line chart, I figured it should be easy to spot in the data. I sorted by database name and then StatDate and this jumped out at me:
The highlighted sections show that, once a week, the log file for this database grows from 1 MB to 90 MB, then drops down to 1 MB again. Checking the rest of the data, I found 4 more databases where this pattern existed. What was going on? I examined those databases and discovered that all of them (and only those databases) had auto shrink enabled.
If you shrink your database, be it via a manually issued command or by having auto shrink enabled, your indexes will end up being very close to 100% fragmented. See this great explanation on Paul Randal’s blog for why. Every weekend, there is a job that runs that rebuilds all indexes that have greater than 10% fragmentation. Therefore, in the databases where auto shrink was enabled, all the indexes fell into this category. Since an index rebuild is a fully logged operation, the log file experienced huge growth each weekend. Then the transaction log gets backed up the following hour and auto shrink kicks in to shrink it back down afterward. (You thought auto shrink only worked on database files? Nope, it works on log files too. From Books Online: “AUTO_SHRINK only reduces the size of the transaction log if the database is set to SIMPLE recovery model or if the log is backed up.”) I turned off the auto shrink option on those databases and the bumps in my chart went away.
Are you going to post the report for download as well?
I wasn’t planning on it. It’s very specific to my company and server setup. The report is fairly basic and uses only two queries: The one for the line chart is
/* Total database size by drive with max drive size */
SELECT server_name,
Drive_letter,
SUM(size_in_mb) as DBSize_in_MB,
CONVERT(varchar(10),statDate,1) as statDate,
MAX(maxdrivesize_in_mb) as TotalDriveSize_in_MB
FROM DatabaseDiskStats
GROUP BY Server_name,
Drive_Letter,
StatDate
And the one for the gauges is
SELECT server_name,
Drive_letter,
SUM(size_in_mb) as DBSize_in_MB,
statDate,
MAX(maxdrivesize_in_mb) as TotalDriveSize_in_MB,
(SUM(CAST(size_in_mb as decimal (9,2)))/MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))) * 100 as PercentFull
FROM DatabaseDiskStats
WHERE statDate = CONVERT([date],getdate(),(0))
and MaxDriveSize_in_MB <>0
GROUP BY Server_name,
Drive_Letter,
StatDate
ORDER BY drive_letter
Each gauge’s filter property is then set to filter for a specific server and drive letter.