Congratulations! You did it! You passed all the tricky interview questions and now you have yourself a shiny new job as a database administrator! Come Monday, you’ll find yourself at a desk and, if the company’s IT department is on the ball, you may even have a phone, computer and network account waiting for you. It’s equally possible you’ll spend all day Monday waiting for these things to be set up. Either way, you’re the new DBA. All eyes are on you and you want to make a good impression, to prove your worth and to show that the company made a good decision in hiring you. To do that, how should you spend your first couple of weeks?
It really depends on the size of the company and their database operations. In larger companies, you will probably be a member of a team of DBAs and they will likely already have procedures established, servers configured correctly, etc. In those cases, just try to find your place in the team and establish what is expected of you.
In smaller companies, you may be the only DBA, possibly taking over for an involuntary DBA – someone who got stuck handling the DBA chores, but really didn’t like it or have the knowledge to do a bang-up job. This has been the majority of cases for me and, personally, I love it. Why? Because the room for improvement is usually huge. Chances are high that the company’s servers are not configured correctly, are suffering from poor performance, and a host of other maladies. You, with your SQL Server expertise, can come in and right the wrongs of the past. Performance will likely improve dramatically and you’ll look like a hero. And who doesn’t want that?
Here are my top 13 things I do when I start at a new company. Most of these are just instituting many of the best practices that are recommended, but you’ll be surprised at how often you’ll find these not done.
To Do Before Lunch
These are a couple things that are vitally important, that are fairly quick to configure, and should be done as soon as you start – before lunch if possible. (You’ll never really get these done before lunch. You’ll be lucky to even have your own email set up before lunch, let alone have email notifications configured. But after you’ve arranged your desk the way you like, after you’ve installed your favorite browser and IM client on your computer, after you’ve updated LinkedIn with your new position, after you’ve loaded your MP3 collection onto your new computer, and after you’ve emailed everyone in your address book your new email address – in short, after you’ve settled in, do these things.)
- Identify all the SQL Servers you are responsible for and note which ones are production systems (mission critical) and which are development / QA / testing (merely important).
- Make sure backups on all servers are A) happening and B) valid. Check that databases that are in Full recovery mode are having their transaction logs backed up. Check that the backup jobs are set to verify the integrity of the backup. If you are backing up to disk, make sure it’s to a different physical disk than the one the mdf files are on and that the disc backups are themselves being backed up to tape or off-site or somewhere. Do a test restore. Remember the physicians motto – first do no harm. You’re at a new place and file locations and database names will be unfamiliar to you. When doing a test restore, be sure not to restore over the live database!
- Check to see if notifications are enabled. Email, net send, pager – some sort of notification should be set up for jobs so that someone (probably you) will be alerted when they fail. (I should note here that both net send and pager notifications will be removed from future versions of SQL Server, so you may want to avoid those.) I like to send alerts to the IT email distribution list for a couple of reasons. First, the list probably already exists. Second, if you are the only DBA and you are not available, at least someone in IT will see the alerts and (hopefully) be able to take some action or find someone who can take action or hunt you down.
- Set up alerts for error severity levels 17 through 25 and, on SQL 2005 and greater servers, for error number 825.
- Make sure the databases are in good shape. Do a DBCC CHECKDB. Look at the output and see if there are any problems. Major problems may have to wait until after hours or the maintenance window to be corrected, but at least you’ll know about them. Note also that if you have large databases, this command could take a while to run, so this is something you might want to do after hours.
- Check the options on the databases. Make sure Auto Shrink is off. Check the Auto Close setting, which should be off in most cases. Check if Auto Grow is enabled. If it is, set it to grow by a fixed amount instead of a percentage. (This one might be difficult to do is you don’t yet have a feel for how quickly the database or log files grow, but you could make a guesstimate and set it to about a tenth of the current logfile size. Monitor growth over the next couple days or weeks and adjust as necessary.) Auto Update Statistics and Auto Create Statistics should be on. (For SQL 2000 and greater. For earlier versions, well, it depends – there are performance considerations you have to weigh.)
To Do In The First Couple Days
The remainder of tasks can be done in your first couple of days and weeks on the job. These are typically steps to take to improve performance or improve reliability.
- Check for virtual log file fragmentation. For details on how to do this, why it is bad , and how to correct it, see step 8 of Kimberly L. Tripp’s post here. Then read her follow up here.
- Check for index fragmentation. For a first pass, you may want to create a job that runs over the weekend or some other low-usage time that rebuilds all the indexes in all the databases. Once you have some time, I would recommend revisiting this job and modifying it so that the job checks the fragmentation level of the index and only rebuilds it if it is above a certain threshold. Again, Kimberly L. Tripp has a nice script to do this for both clustered and non-clustered indexes. She has links to it in the third question on this page. Be sure to test the script and understand what it is doing before you run it.
- Check the location of the data, log, and tempdb files. Ideally data files should be on one drive, log files on another, and tempdb files on a third. If you are using a SAN, the volumes should be on separate spindles. None of these should be on the same drive as the operating system (typically the C: drive).
- Check the number of tempdb data files. In general, you should have one tempdb data file per processor on your server. Note that I say “in general.” As always, your particular situation may vary. For SQL 2000 and earlier, this is a good recommendation. For 2005 and 2008, you could get by with slightly less – maybe 1 file for every two processors. But if you have 8 or less processors, you should be ok with 8 data files for tempdb. If you have some super high-end server with 16 or more processors, don’t get carried away. I’d still probably stick to 8 data files in that case. (Of course, if you’ve got a machine with that kind of horsepower, you’re probably not the only DBA there, so you can work out what is best with the other DBAs.) For SQL 2000, you may want to implement trace flag 1118. Note that the size of your data files should be the same and they should be set to auto grow by a fixed amount, also the same for each. This is because SQL Server uses a round-robin algorithm in determining which file to use and you want to try to ensure even usage across all the files.
- Check the SQL Server patch levels. Are the servers running the latest service pack? How about the latest cumulative update? You will probably need to check with others in your organization before applying updates. Not only will applying updates require downtime, but you should make sure no one is aware of any issues the updates might cause. (I have never run into any, but you never know…) I like this handy page of SQL Server version numbers, which translates the version number @@version returns to a SP or CU number.
- Set up a job to cycle the SQL logs. If your server has a lot of up time – meaning it’s been running for months and months without a restart, odds are the SQL and SQL Agent logs are huge, possibly multi-megabyte size huge. If you need to search them, it could be quite a pain. The wait just to open them could be unbearably long. So I have a job that runs once a month and cycles the log files so a particular log never gets too big. The sp_cycle_errorlog system procedure will cycle the log file on SQL 2000 and later servers. For 2005 and later servers, you’ll also want to cycle the SQL Agent log and the sp_cycle_agent_errorlog system procedure will do that. Update: FYI, sp_cycle_errorlog is in master and sp_cycle_agent_errorlog is in msdb.
- Trim and manage the backup history. This won’t help performance really, but it’s just part of running a tidy ship, in my opinion. SQL Server keeps track of each backup made and each restore performed in several tables in the msdb database. Odds are, those tables contain tons of rows. When I started at my current company, some of the servers were taking transaction log backups hourly and had been for several years. Because each backup creates table entries and no one had ever cleared out these tables, they contained millions of rows. The system stored procedure sp_delete_backuphistory will clear the tables for you, but beware! This procedure can take a VERY long time to run! By default, there are no indexes on the backup history tables. What I suggest is to query the backup tables first and see just how far back the history goes. Then, run the following code to add indexes to some of the tables:CREATE INDEX [media_set_id] ON [dbo].[backupset] ([media_set_id])
CREATE INDEX [restore_history_id] ON [dbo].[restorefile] ([restore_history_id])
CREATE INDEX [restore_history_id] ON [dbo].[restorefilegroup] ([restore_history_id])
Lastly, use sp_delete_backuphistory to delete entries in small batches. For example, if you have data going back 5 years, I would start by deleting everything older than 5 years, than everything older than 4 years 11 months, then 4 years 10 months, etc. (When I started this at my current company, I spent three days doing this as a background task.) The following code will let you monitor the progress of the deletes:SELECT COUNT (*) FROM restorefile
SELECT COUNT (*) FROM restorefilegroup
SELECT COUNT (*) FROM restorehistory
SELECT COUNT (*) FROM backupfile
SELECT COUNT (*) FROM backupset
SELECT COUNT (*) FROM backupmediafamily
SELECT COUNT (*) FROM backupmediaset
Once you get more or less up to date, create a job that runs periodically to clean the tables out, based on your history retention needs. Personally, I have a job that runs nightly that deletes entries older than two weeks.
Those are the major tasks I would do when starting at a new company to ensure the database servers are in good shape. I’m curious to hear if there are some things I missed. Leave a comment if you think so.