I’m Published!

I am now officially a published author! Well, I suppose if you count blog posts, I already was, but today, my first ever non-blog article has been published! I even got paid for it and everything! You can check it out at SQL Server Central here. There is also a discussion thread about it on their forums and that can be found here. The article presents a routine I developed to monitor the disk space usage of my SQL Servers. I know, it’s a topic that has been almost done to death. But most of the routines and scripts I found either used undocumented commands or commands that have been depreciated. In either case, those commands could be removed in future versions of SQL Server, breaking the routine. I wanted something that did not face those problems. They also tended only to focus on a single server, whereas my routine handles all servers on a network.

Please take a moment and read the article or the rest of this post won’t make much sense.

Done? Good. Moving right along…

I’ve implemented this at my work and it’s been running for a couple of months now. I run the routine daily and have used SSRS to create a report that gets emailed to me each morning. The report shows two pieces of information – a 6 month view of the trending disk usage of all our SQL Servers and a point in time view of the percentage of disk space the SQL Server files are using on the server’s disk drives.

Note I didn’t say “show how full the drive is.” My routine, as written will not give that data. The only two pieces of information it collects are the disk space used by the files SQL Server uses and the total size of the drive. Several of my servers are using local drives (not ideal, I know, but it is what it is). Therefore, something else could conceivably come along and use up all the free space on the drive and you’d never know it from looking at this report. I will admit that is a drawback.

Here’s a portion of my report: (Click for a larger view.)

Here, you can see I’m tracking the disk usage of  twelve servers. Now, the routine I wrote actually gathers data for all drives used by SQL Server databases. Typically, I like to have at least three drives per server – one for data, one for logs, and one for tempdb. Multiply that by 12 servers and you’ve got a lot of lines making for a really messy chart. So what I did was simply sum up the space used totals for each server. This gives me a single number for each server and makes for a cleaner chart. The number itself is rather meaningless now, since it represents total bytes used spread across various drives, but all I really want from this chart is the usage trend for a server, so it works fine. The line will still trend upwards if a server suddenly starts using more disk space. The chart won’t tell me if that extra space is being used by data files or logfiles or tempdb, but that’s fine. It will alert me to dig deeper. (And my routine captures individual drive and database data, so I will have that information to drill into anyway.) That’s all I want from this chart.

(I’ll get to those little bumps on the gray line in a minute. The gap in some lines is due to a day when the job failed once time partway through and I forgot to re-run it.)

The second half of my report contains three gauges for each server. Here is a sample for two servers:

Here, I split out the various drives for each server. Rather than label them with individual drives letters, which vary from server to server, I’ve used the generic “Data”, “Log”, and “TempDB” labels to let me know which drive the gauge refers to. Remember when I said not all my servers have three drives? I use the needle color to indicate different drives. So in the top row, the three gauges all have different needle colors, indicating this server has three distinct drives. In the second row, all the gauge needles are the same color, indicating everything is stored on one drive on this server. I have also used color to consistently refer to a particular drive: blue always represents the data drive, purple the log drive, and yellow the tempdb drive. So if, for example, a server has two drives, one for data, and one for the log and tempdb files, the needle colors would be blue, purple, purple.

My previous caveat bears repeating here: these gauges do not show the percent of drive space used. Rather, they only show the percent of total drive space used by the SQL Server data files! This is an important distinction. If another program shares the drive, it could use up all the available free space and this report would never show that. Furthermore, these gauges only show the space usage at the point in time that the data was gathered. In my case, I gather the data and run the report once a day, so the gauges show me the space usage as of 5 AM the morning of the report.

As with the line chart, I am not looking for the gauges to show me exact figures. What I want to watch for is any of the needles going into the red zone. That tells me I need to do some more investigating because something is causing some SQL files to grow.

The report isn’t a perfect monitoring tool, I will admit. In fact, there are many software packages for sale out there that can do a much better job. But for a homegrown DBA tool, I think this routine and the report based on it provide a nice overall view of the disk space usage of the SQL Servers I am responsible for.

Oh, about those bumps in the line chart – I noticed they occur every weekend, which also happens to be the time I have my major index maintenance job scheduled. But the fact that only one server was behaving like this was a bit strange. The server in question is a development server that is not heavily used, so I felt pretty confident that whatever was causing the bumps was due to my weekly maintenance and not some user operation. I did some investigating and discovered that five of the databases on that server had auto shrink enabled. Boo! This was a developer box and they probably either didn’t know better or didn’t care about performance on this server. More likely, both. I turned off auto shrink on the databases and the bumps went away. How could auto shrink, which supposedly makes files smaller, cause an increase in disk space usage? I’m glad you asked!

I went into the table containing the data my routine collected and looked at only the server I was interested in. I then looked for something out of the ordinary. (Recall that my line chart doesn’t differentiate between data, log, and temp drives, so I had to look at all the data.) Since I was looking for something that made a pretty big jump in the line chart, I figured it should be easy to spot in the data. I sorted by database name and then StatDate and this jumped out at me:

The highlighted sections show that, once a week, the log file for this database grows from 1 MB to 90 MB, then drops down to 1 MB again. Checking the rest of the data, I found 4 more databases where this pattern existed. What was going on? I examined those databases and discovered that all of them (and only those databases) had auto shrink enabled.

If you shrink your database, be it via a manually issued command or by having auto shrink enabled, your indexes will end up being very close to 100% fragmented. See this great explanation on Paul Randal’s blog for why. Every weekend, there is a job that runs that rebuilds all indexes that have greater than 10% fragmentation. Therefore, in the databases where auto shrink was enabled, all the indexes fell into this category. Since an index rebuild is a fully logged operation, the log file experienced huge growth each weekend. Then the transaction log gets backed up the following hour and auto shrink kicks in to shrink it back down afterward. (You thought auto shrink only worked on database files? Nope, it works on log files too. From Books Online: “AUTO_SHRINK only reduces the size of the transaction log if the database is set to SIMPLE recovery model or if the log is backed up.”) I turned off the auto shrink option on those databases and the bumps in my chart went away.

Email Notifications In SQL 2000 Without SQL Mail

I love Database Mail. The new mail system that was introduced in SQL Server 2005 is head and shoulders above the old SQL Mail functionality of SQL 2000. Setting up email alerts for failed jobs and error conditions is now a breeze.

Unfortunately, I still have a couple SQL Server 2000 machines in use that I have to take care of. Furthermore, I know there are other DBAs out there in the same situation. For those that never had to deal with it, with SQL Server 2000, in order to send email, you had to have Outlook 2000 installed on your server. It was ugly. Email in SQL 2000 felt like a cludge, an afterthought. Not only that, installing Outlook on your server required purchasing an additional Office license for each server, something that was hard to justify to upper management.

There are some nice community-developed solutions, such as xp_smtp_sendmail. This works great if you need to send email from a stored procedure or via T-SQL. If you wanted to send email if one of your jobs failed, however, it was fairly cumbersome. Because it was not built in to SQL Server, you needed to add a step to each and every job which would make a call to xp_smtp_sendmail to send out the failure notification. Then you had to modify each step of each job to jump to this new step on failure. If you had more than a handful of jobs, it was tedious at best, a nightmare at worst. And if you had several SQL Servers, each with many jobs, well.. Just forget about it.

When I was brought on board at my current company last year, I found out I had two SQL 2000 servers that I would be responsible for. Of course, SQL Mail was never set up on these machines. I had no desire to set it up now, especially since I was going to be pushing to upgrade or replace these servers with SQL 2008 soon. But until that happened, I needed some way to be notified if a job failed.

Luckily, our IT department uses monitoring software that is used to watch other critical systems such as websites and file servers. Your company probably does as well. The particular product we use is called Orion from SolarWinds, but there are many others out there that provide similar functionality. One of the things Orion can monitor is the Windows event log. We can use this to monitor for failed SQL Server jobs.

If you configure your SQL Server job to write to the windows event log on failure, you will see an entry in the application event log like this when a job fails:

Note the event ID value of 208, which signifies a failed SQL Server job. Now all you need to do is configure your monitoring software to monitor the application event logs of your SQL 2000 Servers for event ID 208. Congratulations! You’ve now got email notification for failed jobs on SQL 2000 without buying an additional Office license and without having to deal with SQL Mail!

Note that you can do this with your 2005 and 2008 SQL Servers as well, but Database Mail is so easy to setup and use, I don’t see a reason to do so.

SSIS And Parallelism

I usually don’t make entries here that consist of just links to other things, but I have to make an exception in this case. I just came across Josef Richberg’s article about using SSIS and parallelism and I am thoroughly impressed! He took a job that was taking 40 hours to run and cut it down to 2 hours by breaking the problem up into pieces and using parallel processing provided by SSIS to tackle different parts of the problem at the same time. The method he came up with for distributing the work is genius. In fact, I find the logic and thought process behind coming up with the method to be more interesting then the process itself. I don’t currently have a need for something like this, but it is something I will definitely keep in mind in case such a need comes up.

SSIS and Parallelism: The Unseen Minions

Frustration!

I write these posts in advance and schedule them to be published once a week, on Mondays. Typically, I work about two or three weeks ahead of the publication date. As a result, I’ll often write something and then see a similar story on another blog sometime later, before mine has been published. It’s really frustrating.

For example, my post Twice Bitten, Thrice Shy was published on July 12. It was actually written during the last week of June. In that post, I talked about assumptions and pulled out the old ass-u-me nugget, along with a picture of an ass. Then, on July 9, Steve Jones over at SQL Server Central put out an editorial about assumptions featuring a picture of an ass. (At least it wasn’t the same picture.)

Today, a similar thing happened. I wrote the previous post about consistency last week, around September 7, but it wasn’t published until September 20. In that one, I referenced McDonalds, the Big Mac, their old “two-all-beef-patties…” jingle, and a picture of a Big Mac. Two days ago, September 14, SQLRockstar published a post titled Big Mac Index that featured those exact same things.

In a similar vein, I wrote Hey, Check Out The New Kid about things to check for on SQL Servers when you start a new job. After writing that, I discovered Brent Ozar’s 60 Minute SQL Takeover presentation, which covers the same topic, although from the perspective of being handed a new server at an existing job. Well, at least we recommended doing pretty much the same things.

Argh!!

Maybe I need to think a bit harder about the comparisons I use and not go for the easy ones.

Maybe I need to change my blog site password so my draft posts can’t be read.

But something’s got to be done because this is getting frustrating!

I’m Nothing, If Not Consistent

I’d like to say a few words today about consistency. Whether you are responsible for maintaining two SQL Servers or dozens, your job will be a lot easier if you can make things as consistent as possible across all your servers. It can be difficult, especially when your servers are running different versions of SQL Server. But if you take the time to standardize your scripts and processes across your servers, you’ll save loads of time when you have to troubleshoot something.

Standardization is the foundation of many large companies’ successes. McDonald’s food tastes like crap (in my opinion), but they are consistent. You can go into any McDonald’s in the world, order a Big Mac, and know exactly what you are going to get: two all beef patties, special sauce, lettuce, cheese, pickles, onions on a sesame seed bun. There is some comfort in that.

Similarly, consistency in your SQL Servers provides a certain level of comfort for a DBA. My maintenance plans are the same across all my servers. Sure, maybe some of the servers have databases that aren’t as critical as others. Maybe we can accept losing 6 hours worth of data in some places, but only 1 hours worth in others.  No matter. I’m taking transaction log backups hourly. It is so much simpler, when someone asks you “How often are backups being made?”, to say “Every hour” rather than “It depends. What database are you talking about?” Should you need to restore a database, it’s also easier knowing how many transaction log back ups you need to look for and restore.

I have a couple homemade scripts that run on each server. One of these scripts checks the fragmentation level of all the indexes in each database and, if it is above a certain threshold, rebuilds the index. (This is a modified version of the script created by Kimberly L. Tripp.)  The script recently failed on one of my servers. It turns out, the script, as written by Kimberly, did not handle tables owned by users other than dbo (or schemas other than dbo, in more specific terms). So I tweaked the script slightly to handle this. Even though this script was failing on only one of my servers, I deployed the updated script to all of my servers. Why? Because my changes amounted to two lines of code in a script of over 100 lines. If, 6 months from now, the script started failing on one server, but not on others, I would not want to have to compare all those lines of code looking for the differences between the two while troubleshooting. By keeping everything the same across servers, I eliminate one unknown from the list of potential problem sources. I have also just provided myself with a bit of insurance. Should some non-dbo schemas start being used on other servers, I won’t even notice. The job will smoothly handle them and I won’t get a job failed alert at two in the morning.

In some instances, scripts will need to use different commands, depending on the version of SQL Server they are running under. In that case, I still like to write one script, but I check the status of the @@version variable and include branching logic to run the appropriate command based on the server version. That way, I can still have one script for all my servers.

Another place consistency helps is the location of your reports. I have my scheduled jobs set up to write log files, which helps in providing details when jobs fail. If you specify a common place where they are stored, it makes finding them a lot easier. I don’t necessarily mean one directory that all servers write to, but one consistent place across servers. For example, I put my logs in a folder called LOGS in the backup directory. This is a different physical location for each server, but I know exactly where to go to find them, no matter which server I am on. I also like to put a shortcut to that folder on the desktop of all my servers for quick access. Again, consistency saves time.

Server configuration is another place where consistency can help. All my servers are (now) set to use mixed mode authentication. I did however, run into a server in the recent past that wasn’t. I wasted three hours tracking down the problem. I also like to configure all my servers to use trace flag 3226, which stops backup success messages from bloating your server’s event log. This isn’t a critical thing, but again, it’s nice to be consistent.

As your company grows and the number of servers you are responsible for increases, consistency will play a major role in reducing and simplifying your workload.