Holy False Alarm, Batman!

When I get a notification from one of my SQL Servers that a job failed or something is broken, I leap into action. Those alerts are like the Batsignal for a DBA. They are a big deal and I don’t want to see them unless whatever caused it really is a big deal. So I am really annoyed when I get what I call stupid alerts – alerts that technically are correct, but really don’t mean anything or are caused by a trivial transitory condition. A perfect example is the alert my transaction log backup job sends out if it fails.

A real, legitimate alert would signal something like a lack of disk space or corruption or some other equally serious issue. But for a time, I was getting alerts from this job whenever a new database had been created. This is, in part, due to they way our backup policy is implemented. We take full backups once a week, differential backups daily, and transaction log backups hourly. Since a transaction log backup cannot be made until an initial full backup has been made, new databases would often cause my transaction log backup job to fail, since it runs every hour. (They would also cause my differential backup job to fail too, but the transaction log backup job runs more frequently, so it was the main culprit.) Even if I was the one who created the database and know I have to create a full backup right away, I would sometimes get the failed job alert if I happened to create the database right before the transaction log backup job ran. Annoying. The alert really doesn’t mean anything because I know the database hasn’t had a full backup yet – I just created it! It likely doesn’t even have any data in it. But technically, the alert is correct.

So to overcome this problem, I modified my transaction log backup and differential backup maintenance plans to include an initial step. This step checks to see if there are any databases that haven’t had a full backup created yet and, if so, makes one. Then any transaction log or differential backups can be made with no problems.

We backup to disk at our company, with each database’s backups in their own directory. My code, shown below, places these initial full backups in one folder, which I called InitialBackups. So that I remember what this directory is for, I created a ReadMe.txt file that I keep in that directory telling me or anyone else that might come across the directory what these backups are and that they can be deleted once the regularly scheduled full backup job has made a backup of the database. (Actually, since this is a subfolder of where my regular backup files are stored, my regular maintenance plan clean up job deletes then automatically after a certain amount of time.)

DECLARE @InitialBackupPath varchar(1000)
DECLARE @dbname varchar(256)
DECLARE @BackupPath varchar(1260)

SET @InitialBackupPath = ‘<insert your backup directory here>\’

DECLARE DatabaseCursor CURSOR FOR
SELECT name FROM sys.databases WHERE name <> ‘tempdb’ AND source_database_id is NULL

OPEN DatabaseCursor

FETCH NEXT FROM DatabaseCursor INTO @dbname
WHILE @@FETCH_STATUS = 0
BEGIN

IF NOT EXISTS (SELECT    1
FROM    msdb.dbo.backupset
WHERE    type = ‘D’ AND database_name = @dbname)
BEGIN

SET @BackupPath = @InitialBackupPath + @dbname + ‘.bak’
BACKUP DATABASE @dbName TO DISK=@BackupPath
END

FETCH NEXT FROM DatabaseCursor INTO @dbname

END

CLOSE DatabaseCursor
DEALLOCATE DatabaseCursor

UPDATE: Ran into a problem with this code this morning. I had just made a snapshot and, shortly thereafter, my transaction log backup failed. Turns out, the SELECT statement that defines the cursor was picking up the database snapshot. A snapshot cannot be backed up, so the job failed. I added the code in red to exclude snapshots from the cursor. Currently, snapshots are an Enterprise-only feature, but the code will still work on non-Enterprise versions as well.

SSMS Feature Request – Filter By Datetime, Not Just Date

I’ve been working lately on migrating a MySQL database to SQL Server. As part of this process, once the database has been migrated, an update of the program using the database is run, which makes some changes to the structure of the database. I designed a nice little SSIS package to migrate the data from MySQL to SQL Server. Than runs without problems (and utilizes this nice ADO/.NET driver for MySQL). We are currently seeing problems during the upgrade process.

The upgrade makes some new tables and modifies existing tables in the database once it is in SQL Server. Because we are hitting errors, we are reloading the database and re-running the upgrade process multiple times a day as we troubleshoot. We ran into a situation where the upgrade failed after creating some new tables. In order to re-run the upgrade process, I needed to delete those new tables. SSMS has a nice filter option that can filter which tables (or other database items) are displayed. I would like the filter to show only the new tables created by the failed upgrade, which were created about 1.5 hours after the other tables were created.

Here’s a shot of the table properties of one of the old tables.

I want to show all tables created on 7/28/10 after 4:29 PM, which would be the ones the upgrade process created. When I tried to make such a filter, this is what I got:

Hmm.. I have to select a date from the calendar and I cannot type a date in. Which means I cannot enter a time portion. Which means my minimum filter granularity is 1 day. If anyone from Microsoft is reading this, I’d like to see this changed to allow times as well. I understand the calendar is used to simplify things for the user, but I think you can still use the calendar for date picking, but then display a time portion of midnight (which is really what a date only means) or 11:59:59 PM, depending on which side of the day you want to default to. That way, you still get the ease of use of the calendar but if the user needs to, he or she can change the time portion as well.

(I know I can write some T-SQL to query sys.tables, get the created datetime, and delete the appropriate tables, but I was using the GUI and thought this would be quicker.)

Another pet peeve of mine regarding SSMS involves scripting. The default behavior, when you right click a table and choose Script Table As…, is that the resulting script will not include any indexes other than the one that defines the primary key. Turns out, this is an option setting that you can change. In SSMS, go to Tools –> Options and expand the SQL Server Object Explorer node, then select Scripting. Set the Script Indexes property to True.

Hey, Check Out The New Kid

Congratulations! You did it! You passed all the tricky interview questions and now you have yourself a shiny new job as a database administrator! Come Monday, you’ll find yourself at a desk and, if the company’s IT department is on the ball, you may even have a phone, computer and network account waiting for you. It’s equally possible you’ll spend all day Monday waiting for these things to be set up. Either way, you’re the new DBA. All eyes are on you and you want to make a good impression, to prove your worth and to show that the company made a good decision in hiring you. To do that, how should you spend your first couple of weeks?

It really depends on the size of the company and their database operations. In larger companies, you will probably be a member of a team of DBAs and they will likely already have procedures established, servers configured correctly, etc. In those cases, just try to find your place in the team and establish what is expected of you.

In smaller companies, you may be the only DBA, possibly taking over for an involuntary DBA – someone who got stuck handling the DBA chores, but really didn’t like it or have the knowledge to do a bang-up job. This has been the majority of cases for me and, personally, I love it. Why? Because the room for improvement is usually huge. Chances are high that the company’s servers are not configured correctly, are suffering from poor performance, and a host of other maladies. You, with your SQL Server expertise, can come in and right the wrongs of the past. Performance will likely improve dramatically and you’ll look like a hero. And who doesn’t want that?

Here are my top 13 things I do when I start at a new company. Most of these are just instituting many of the best practices that are recommended, but you’ll be surprised at how often you’ll find these not done.

To Do Before Lunch

These are a couple things that are vitally important, that are fairly quick to configure, and should be done as soon as you start – before lunch if possible. (You’ll never really get these done before lunch. You’ll be lucky to even have your own email set up before lunch, let alone have email notifications configured. But after you’ve arranged your desk the way you like, after you’ve installed your favorite browser and IM client on your computer, after you’ve updated LinkedIn with your new position, after you’ve loaded your MP3 collection onto your new computer, and after you’ve emailed everyone in your address book your new email address – in short, after you’ve settled in, do these things.)

  1. Identify all the SQL Servers you are responsible for and note which ones are production systems (mission critical) and which are development / QA / testing (merely important).
  2. Make sure backups on all servers are A) happening and B) valid. Check that databases that are in Full recovery mode are having their transaction logs backed up. Check that the backup jobs are set to verify the integrity of the backup. If you are backing up to disk, make sure it’s to a different physical disk than the one the mdf files are on and that the disc backups are themselves being backed up to tape or off-site or somewhere. Do a test restore. Remember the physicians motto – first do no harm. You’re at a new place and file locations and database names will be unfamiliar to you. When doing a test restore, be sure not to restore over the live database!
  3. Check to see if notifications are enabled. Email, net send, pager – some sort of notification should be set up for jobs so that someone (probably you) will be alerted when they fail. (I should note here that both net send and pager notifications will be removed from future versions of SQL Server, so you may want to avoid those.) I like to send alerts to the IT email distribution list for a couple of reasons. First, the list probably already exists. Second, if you are the only DBA and you are not available, at least someone in IT will see the alerts and (hopefully) be able to take some action or find someone who can take action or hunt you down.
  4. Set up alerts for error severity levels 17 through 25 and, on SQL 2005 and greater servers, for error number 825.
  5. Make sure the databases are in good shape. Do a DBCC CHECKDB. Look at the output and see if there are any problems. Major problems may have to wait until after hours or the maintenance window to be corrected, but at least you’ll know about them. Note also that if you have large databases, this command could take a while to run, so this is something you might want to do after hours.
  6. Check the options on the databases. Make sure Auto Shrink is off. Check the Auto Close setting, which should be off in most cases. Check if Auto Grow is enabled. If it is, set it to grow by a fixed amount instead of a percentage. (This one might be difficult to do is you don’t yet have a feel for how quickly the database or log files grow, but you could make a guesstimate and set it to about a tenth of the current logfile size. Monitor growth over the next couple days or weeks and adjust as necessary.) Auto Update Statistics and Auto Create Statistics should be on. (For SQL 2000 and greater. For earlier versions, well, it depends – there are performance considerations you have to weigh.)

To Do In The First Couple Days

The remainder of tasks can be done in your first couple of days and weeks on the job. These are typically steps to take to improve performance or improve reliability.

  1. Check for virtual log file fragmentation. For details on how to do this, why it is bad , and how to correct it, see step 8 of Kimberly L. Tripp’s post here. Then read her follow up here.
  2. Check for index fragmentation. For a first pass, you may want to create a job that runs over the weekend or some other low-usage time that rebuilds all the indexes in all the databases. Once you have some time, I would recommend revisiting this job and modifying it so that the job checks the fragmentation level of the index and only rebuilds it if it is above a certain threshold. Again, Kimberly L. Tripp has a nice script to do this for both clustered and non-clustered indexes. She has links to it in the third question on this page. Be sure to test the script and understand what it is doing before you run it.
  3. Check the location of the data, log, and tempdb files. Ideally data files should be on one drive, log files on another, and tempdb files on a third. If you are using a SAN, the volumes should be on separate spindles. None of these should be on the same drive as the operating system (typically the C: drive).
  4. Check the number of tempdb data files. In general, you should have one tempdb data file per processor on your server. Note that I say “in general.” As always, your particular situation may vary. For SQL 2000 and earlier, this is a good recommendation. For 2005 and 2008, you could get by with slightly less – maybe 1 file for every two processors. But if you have 8 or less processors, you should be ok with 8 data files for tempdb. If you have some super high-end server with 16 or more processors, don’t get carried away. I’d still probably stick to 8 data files in that case. (Of course, if you’ve got a machine with that kind of horsepower, you’re probably not the only DBA there, so you can work out what is best with the other DBAs.) For SQL 2000, you may want to implement trace flag 1118. Note that the size of your data files should be the same and they should be set to auto grow by a fixed amount, also the same for each. This is because SQL Server uses a round-robin algorithm in determining which file to use and you want to try to ensure even usage across all the files.
  5. Check the SQL Server patch levels. Are the servers running the latest service pack? How about the latest cumulative update? You will probably need to check with others in your organization before applying updates. Not only will applying updates require downtime, but you should make sure no one is aware of any issues the updates might cause. (I have never run into any, but you never know…) I like this handy page of SQL Server version numbers, which translates the version number @@version returns to a SP or CU number.
  6. Set up a job to cycle the SQL logs. If your server has a lot of up time – meaning it’s been running for months and months without a restart, odds are the SQL and SQL Agent logs are huge, possibly multi-megabyte size huge. If you need to search them, it could be quite a pain. The wait just to open them could be unbearably long. So I have a job that runs once a month and cycles the log files so a particular log never gets too big. The sp_cycle_errorlog system procedure will cycle the log file on SQL 2000 and later servers. For 2005 and later servers, you’ll also want to cycle the SQL Agent log and the sp_cycle_agent_errorlog system procedure will do that. Update: FYI, sp_cycle_errorlog is in master and sp_cycle_agent_errorlog is in msdb.
  7. Trim and manage the backup history. This won’t help performance really, but it’s just part of running a tidy ship, in my opinion. SQL Server keeps track of each backup made and each restore performed in several tables in the msdb database. Odds are, those tables contain tons of rows. When I started at my current company, some of the servers were taking transaction log backups hourly and had been for several years. Because each backup creates table entries and no one had ever cleared out these tables, they contained millions of rows. The system stored procedure sp_delete_backuphistory will clear the tables for you, but beware! This procedure can take a VERY long time to run! By default, there are no indexes on the backup history tables. What I suggest is to query the backup tables first and see just how far back the history goes. Then, run the following code to add indexes to some of the tables:
    CREATE INDEX [media_set_id] ON [dbo].[backupset] ([media_set_id])
    CREATE INDEX [restore_history_id] ON [dbo].[restorefile] ([restore_history_id])
    CREATE INDEX [restore_history_id] ON [dbo].[restorefilegroup] ([restore_history_id])

    Lastly, use sp_delete_backuphistory to delete entries in small batches. For example, if you have data going back 5 years, I would start by deleting everything older than 5 years, than everything older than 4 years 11 months, then 4 years 10 months, etc. (When I started this at my current company, I spent three days doing this as a background task.) The following code will let you monitor the progress of the deletes:

    SELECT COUNT (*) FROM restorefile
    SELECT COUNT (*) FROM restorefilegroup
    SELECT COUNT (*) FROM restorehistory
    SELECT COUNT (*) FROM backupfile
    SELECT COUNT (*) FROM backupset
    SELECT COUNT (*) FROM backupmediafamily
    SELECT COUNT (*) FROM backupmediaset

    Once you get more or less up to date, create a job that runs periodically to clean the tables out, based on your history retention needs. Personally, I have a job that runs nightly that deletes entries older than two weeks.

Those are the major tasks I would do when starting at a new company to ensure the database servers are in good shape. I’m curious to hear if there are some things I missed. Leave a comment if you think so.

SP_Configure ALLOW_UPDATES Still Has An Effect In SQL 2008

Out of date documentation sucks

At virtually every company I have worked at, documentation has always lagged behind the product. This is especially true for products that are used only internally. It’s understandable to some extent. Many IT departments are understaffed and the team is spending the majority of their time either putting out fires or working to keep everything running smoothly. But for commercially available software, I have higher expectations, especially when that documentation is A) from a company as large as Microsoft and B) is available on-line, where changes can be made instantly and there are no printing delays to worry about. Documentation that is just flat out wrong not only creates headaches for people who rely on that documentation to use the software, but it also results in wasting the time of people who are already short on that precious commodity.

Let me share my latest experience with this problem in case someone else runs into the same situation. While working on my replication test project, I was using code that modified system tables, so I used this command:

SP_CONFIGURE ‘ALLOW UPDATES’, 1
RECONFIGURE WITH OVERRIDE

The project involved working with  both a SQL 2000 machine and a 2008 machine. The command worked on the 2000 machine and seemed to work on the 2008 machine – at least, the command didn’t generate an error. I forgot that SQL Server 2008 doesn’t allow direct modification of system tables (actually, neither does 2005), so the command was turned into a NOP. According to this Microsoft TechNet post, this command was disabled in SQL 2005. It states:

This option is still present in the sp_configure stored procedure, although its functionality is unavailable in SQL Server. The setting has no effect. Starting with SQL Server 2005, direct updates to the system tables are not supported.

The exact same statement is present on the TechNet page for SQL 2008. Ok. So I went about continuing my testing using other methods. But then one of my jobs failed. Specifically, I had a maintenance plan that was creating differential backups. As part of this plan, all user databases are consistency checked. This was the step that appeared to be failing. The error log showed: “Failed:(0) Alter failed for Server” A Profiler trace showed the statement that was causing the error was:

EXEC sys.sp_configure N’user options’, 0 RECONFIGURE

I remembered that I had run the above code to set the ALLOW UPDATES value to 1. As that was the only thing that had changed on the server, I ran it again, setting the value back to 0. The maintenance plan now worked! Hmm. So this command that supposedly has “no effect” actually does have an effect!

Further research shows Microsoft has known about this since 2007. This Knowledge Base article details this exact problem.

Additionally, the TechNet articles linked to above state that changing the ALLOW UPDATES option will generate an error. Not in all cases. Here is the command running on my 2008 SQL Server. Note the lack of an error message.

I am using the WITH OVERRIDE option. This option will disable some configuration value checking, apparently including the check that returns an error when the ALLOW UPDATES option is used. The same command without WITH OVERRIDE returns an error.

Allow me to vent for a moment. Just to get this straight – Microsoft knows this setting that they say has “no effect” actually does have an effect. They knew about this back when SQL 2005 was the latest version of SQL Server. I am experiencing the exact same behavior running SQL 2008 SP1 CU7. According to this handy SQL Server release calendar at SQLServerpedia.com, there have been 40(!) releases of patches to SQL 2005 and/or 2008 between when the KB article was written and 2008 SP1 CU7 was released. It still isn’t fixed. Let’s get it together, Microsoft. If you can’t correct the code, at least correct the documentation so that people don’t waste hours scratching their heads wondering why things aren’t working.

Phew. Now that that’s off my chest, I feel a bit better.

Twice Bitten, Thrice Shy

I’ve been bitten twice recently by my reliance on faulty assumptions. I’m sure we all remember the old adage that when you assume you make an ASS out of U and ME. Well, I have certainly proven the truth in that saying lately. (Ok, maybe I only made an ass out of me and not you…)

The replication project I wrote about last week started off because we needed to replace a SQL 2000 cluster that hosted a subscriber database and we wanted to upgrade to SQL 2008 at the same time. I knew that replication is limited to working between versions of SQL Server that are no more than two versions apart. So replicating from SQL 2000 to SQL 2008 would work. But for some reason, I thought that the version difference had to be “downhill” – meaning the publisher had to be the higher version and the subscribers had to be the lower version. Based on this, I thought replication in my situation would not work, so I started investigating other ways to solve the problem of how to copy data on a regular basis from a SQL 2000 machine to a 2008 machine. I looked at log shipping and determined that wouldn’t work because the 2008 machine (subscriber) needed to have the database available for reporting. So I had about settled on using a homegrown process of automating nightly backups and restores when I posted to the forums on SQLServerCentral.com asking for help from the SQL community to see if there was perhaps a more elegant way of doing what I was trying to accomplish. Someone pointed out that in my situation, replication actually was possible! In fact, this TechNet article even states: “SQL Server 2000 and SQL Server 2005 can both participate in replication topologies with SQL Server 2008.” (Turns out, the version of the distributor is the limiting factor. The same TechNet article says: “For all types of replication, the Distributor version must be no earlier than the Publisher version.”  Since my distributor and publisher were the same machine, I was ok.)

So I wasted a day or two heading down the wrong path simply because I did not verify what I thought about replication was true.

On the same project, I was bit again, although this time I only wasted about 3 hours. One of the things I needed to do was transfer logins from the publisher to the subscriber. I used the sp_help_revlogins procedure given in this Microsoft Knowledge Base article. I tried both methods given and I was still unable to use the SQL login on the subscriber machine. Then I tried making an SSIS package and using the Transfer Logins task. That didn’t work either. Whenever I tried to log in to the subscriber machine using a transferred SQL account, I got a login failed message. I started to delve into SIDs and other arcane details of transferring SQL logins when my mind dredged up something that I read a while ago. I recalled that the login failed message presented to the user was purposely vague, to help prevent hacking, but typically, a more detailed reason for the failure would be written to the Windows event log. So I checked the event log and sure enough, I saw this:

Login failed for user ‘<username>’. Reason: An attempt to login using SQL authentication failed. Server is configured for Windows authentication only. [CLIENT: <local machine>]

Doh! All our servers (or I should say, almost all our servers) are configured for mixed mode authentication. The one I was using for testing, however, was apparently not. So that was why my login attempts using a SQL account failed. The logins had transferred successfully, but SQL Server was not configured to use them. I changed the server to mixed mode authentication and everything worked.

So when things aren’t working out the way you think they should, it very often pays to take a step back and question the assumptions you are working under. You could save yourself a lot of time.