The last couple of weeks have been pretty busy, which is why my posts here have been somewhat erratic. As I previously mentioned, I’ve started a new job at a company that has many more SQL Servers than anyplace I’ve worked before. During the interview process, I was told the company had more than 600 SQL Servers. Turns out, they have more than 600 databases, not servers. Still, they have over 70 servers, which is still a much larger operation than any place I have worked previously. I’m still getting the experience of working in a larger environment, which was one of the reasons I switched jobs.
Now it’s time to follow my own advice. I’ve spent the last two and half weeks doing the basic stuff on my New Kid Checklist (I’m still working through the To Do Before Lunch section!). One of the things I caught on my first or second day was that virtually none of the databases had ever had a DBCC CHECKDB run on them! Of the few that had, the latest time one was run was over three years ago. (I used the script here, which SQL Soldier just happened to publish right when I started my new job) Part of the reason, I was told, was that some of the databases are fairly large and running DBCC CHECKDB takes time and causes blocking. In my mind, that’s not an excuse to not run the check! The company is also mostly running SQL 2005 and some 2008 right now, so blocking caused by DBCC shouldn’t really be an issue anymore. However, as the newest of two DBAs, I’m not willing to rock the boat to much and take on the DBCC blocking myth just yet, especially since I don’t have any idea of the types of loads the servers are typically seeing. So instead, I put together a proposal that would simultaneously do four things: run DBCC CHECKDB on all their databases, check to see if DBCC WITH DATA_PURITY needed to be run, check to see if DBCC UPDATEUSAGE needed to be run, and finally, to verify all our backups are good. My method was to restore a backup of each of their databases onto a test server and run the various checks against the restored backup. I would note which ones had problems so we could then schedule a maintenance window to run the checks on the live database. It’s slow going. My long term goal is to develop some sort of automated process for this, preferably using Powershell, so I can test out some of the cool stuff I learned at SQL Saturday. But for now, I’m doing the first pass manually.
The DATA_PURITY and UPDATEUSAGE checks are needed because they have migrated many databases from SQL 2000 and these checks need to be performed after migrating from that version to a newer version. Paul Randal has a nice blog post that details how you can tell if the DATA_PURITY check needs to be run. The only way to tell if UPDATEUSAGE needs to be run (if you don’t already know there are errors in system counts) is to run it. While I expected all the databases that were migrated from SQL 2000 to need UPDATEUSAGE run, I was surprised to see some that were created in 2005 also had some counts that needed to be corrected.
I also found some other important facts. First, I’ve discovered several databases that were not getting backed up, plus some others that were not being backed up correctly. For instance, they were taking full, differential, and log backups, but something was wrong with the retention settings for some databases, so the full backups were deleted, leaving the differential and log backups worthless. I also found that, on at least some servers, the backup history was not being managed, leaving millions of records in the backup history tables in msdb (step 7 in my checklist under To Do In The First Couple Days).
One bit of good news from all of this is that DBCC CHECKDB has not found a single error so far.
So it’s been a rather mundane first two weeks – I’m doing some of the most basic DBA tasks out there, but they are things that need to be done. It feels good to whip these databases into shape so going forward, I know we’re on a solid footing.