Last weekend was patch weekend and several of my SQL Server machines got Windows Updates applied and were rebooted. Today, I discovered that on a couple of severs, SQL Agent did was not running. The first thing I did was verify that the service was set to autostart. It was. Hmm. Time to dig into the logs.
The user that generated the login failure was the SQL Server agent account. Database 5 was my DBA utility database. Now things were starting to make sense. I also realized that all the servers where SQL Agent did not start after a reboot were ones where I had a logon trigger installed. The trigger is an auditing tool that logs each user’s login time to my DBA utility database.
What was going on? SQL Agent was set to autostart, so it was coming up as soon as SQL Server itself was running. But looking at the log, I could see that SQL Agent was starting up before SQL Server had brought all the databases online. In fact, prior to the entries shown above, the only “Starting up database…” message was for the master database and tempdb. All the other databases came online after the “login failure” errors. Notice the 904 error – ‘Database 5 cannot be autostarted during server shutdown or startup.” This is telling me that the database was not started when the trigger tried to access it. Because that database has autoclose disabled, I know the only reason it was not started yet was that SQL Server was still in the process of bringing it online during system startup.
So how do I fix this problem? There are a couple of possible solutions. Because the master database is the first one brought online by SQL Server, I could move the trigger logging tables into that database. This would ensure that the logon trigger will always work because the master database will always be online before SQL Agent starts. However, I strongly dislike putting any user code in any system databases, so I wanted to find another solution.
Another option would be to change the logon trigger to not log anything by the SQL Agent service account. This would require hard coding the service account into the stored procedure, something I do not like doing. If we ever change that account, I’d have to remember to also change the stored procedure on every server where it was used. (My actual trigger does have a list of accounts to ignore, but that data is stored in a table in the utility database, so it is not accessible during this boot up period.)
The fix I ended up using turned out to be the Windows Delayed Start option for services. Windows Server 2008 includes a new service startup option called “Automatic (Delayed Start).” I changed my SQL Agent service to use this setting and my problem went away. SQL Agent did not start for about 1 to 2 minutes after SQL Server came up, which allowed plenty of time for my DBA utilities database to be brought online. The trigger could then access it when the Agent service started.
Note that this may have some other effects on your system. If you have any jobs that are set to run when SQL Agent starts, you’ll need to test these and make sure they still do what they are supposed to do when the service starts a bit later than normal. However, in my case, the delayed start option was a perfect solution to this problem.
I came across this blog post while doing some research into the issue myself, and got into a twitter conversation with @SQLSoldier and @SQLAgentman about it. A really good suggestion came of it, and that was to add an exclusion to the trigger for the SQL Agent service account, so that the trigger does not fire for that account. Option 2 would be to add some “wait” logic into the trigger if your DBA Utility DB is not online yet.
As I mentioned, I’m not a fan of hard coding service accounts into triggers (or stored procedures for that matter), but that is an option to workaround this. Wait logic would work as well. Of course, setting the SQL Agent service to delayed start is basically one way to implement wait logic :-)