Shaun J Stuart

Just another SQL Server weblog

Browsing Posts in SQLServerPedia Syndication

Too many jobs to keep track of!

After watching Brent Ozar's webinar on backups and virtual machines, I was inspired to take a deeper look at our company's overall backup plans.  I had already gone through the code that was actually doing the backup, found some crazy stuff, and fixed it, but this time through I wanted to take a closer look at our schedules and retention periods. I'm glad I did.

I and one other DBA maintain approximately 55 SQL Servers here. We're both fairly new to the company and therefore were not involved in setting up most of the current environment. And you know how things go - if it ain't broke, don't fix it. So there were some things we haven't gotten around to reviewing. Backups were one of them.

I went through and noted the backup schedules on all our servers and how long they took to complete, with the goal of staggering the times to ease network traffic, per Brent's webinar. What I found was a mess of inconsistency and duplication. I found several servers that would take a full backup of all databases each night at midnight, then take differential backups 30 minutes later - and there was no other activity on these systems during that 30 minutes! I found retention times for transaction log backups set to 8 days but the differential and full backup retention times set to 4 days. So half of those transaction log backups are worthless because they can never be restored. The amount of disk space and processor resources that were being wasted was huge!

It easy to see how this came about. In an environment with a large number of servers, no centralized or standardized backup functionality, and multiple DBAs, this happens. To help prevent this, I've created an internal wiki page that documents our standard jobs, their schedules, and our standard server configuration. While my manager is impressed because he sees this as me helping the company and future DBAs, the real reason I do it is because I can't remember everything. (But don't tell him that!) This is the same reason airline pilots go through a written checklist before take off -  there are just too many little things that are easy to forget.

So if you have been in your position long enough to have seen the number of SQL Servers grow significantly, or you are in a new position and have inherited a bunch of machines, it would probably be worth your while to take the time to revisit your backup jobs across your company and make sure you have things set up they way you intend them to be.

Share

Updated: February 22, 2012

Some time ago, I wrote and published a routine to gather disk space usage from multiple SQL Servers. Since that was written, I've made some bug fixes and modifications to the routine. Tracking down those changes was becoming tedious, as they were spread out over several blog posts. I've decided to create this page to overcome that issue. This page will always contain the latest version of the routine. The date at the top will reflect the last time I've made changes.

The zipfile found here contains all the code needed for this procedure. It also includes a bonus stored procedure, SQLDriveSpaceWarningEmail.sql, which I wrote about at the end of this post.

Changelog

2/22/12

  • Corrected error in script zipfile - script for ServersToCheckDiskStats table did not include IncludeInCheck column.

7/21/11

  • Added IncludeInCheck column to ServersToCheckDiskStats table. This allows skipping of servers. Due to the foreign key constraints, you are not able to delete records from this table if there is stat data for that server in the stats table. So if a server is decommissioned or renamed or no longer needs to be tracked, set this bit to true and it will be skipped in the data collection process.
  • Added SeverDisplayName column to ServersToCheckDiskStats table. If you are checking servers in different domains, this allows you to put an IP address in the ServerName field to connect with. The ServerDisplayName field will display the user-friendly name for use in reports and is the name stored in the DatabaseDiskStats table. If you are not using IP addresses, the ServerName and ServerDisplayName fields should be the same.
  • Modified GatherServerDiskUsageData procedure to use above new columns.
  • Added Notes column to ServersToCheckDiskStats table. Just a place to store notes.

Share

I ran into an issue with Windows directory access permissions yesterday that I thought was worth a post. Here at my company, we use LiteSpeed for our backups. For those not familiar with LiteSpeed operation, it basically creates some extended stored procedures that link to DLLs that perform database backups, compression, and a host of other things. So instead of using T-SQL's BACKUP command, you call xp_backup_database and / or some other procedures, which then call the LiteSpeed DLLs to do the work. We've got this set up to run via a scheduled job.

For some other reasons, we had one server where we needed to change the account SQL Server and SQLServer Agent used temporarily. We changed them from the normal account to Local System. While running under this account, our backup jobs for that server started failing. A little digging showed the failure was due to that account not having access to the backup directory:

We all know that when jobs are executed, the commands are run using the account permissions for the Windows account SQL Server Agent is running under. So this seemed like a straightforward issue to me. The other work that required the service account changes had been complete and I was free to move the services back to running under their original accounts. Since this was a production system, I was not able to immediately change the SQL Service account, but I was able to change the SQL Agent account. So I did this, restarted the service and re-ran the backup job, thinking it should now complete successfully.

It still failed with the permission denied error. Hmm.. I tried running the command manually via SSMS and got the same error. This got me thinking about what was going on behind the scenes. Because the command was calling an extended stored procedure, which in turn calls a DLL, I theorized that even though SQL Agent was the account executing the job, the call to the DLL was being done by the SQL Server engine and thus, was made under the SQL Server service account. I did not find any documentation of this anywhere. Truthfully, I didn't look too hard because I knew I was going to be able to change the SQL Server service account the following morning. When I did that and restarted the service, the job completed without error.

As I said, I did not research this very much, so I will not say this is 100% certain. But my experiment certainly does provide some pretty significant evidence: extended procedures are executed using the SQL Server service account, not the SQL Agent service account, even when executed from a scheduled job.

(I will also note this was done using SQL 2005. Other versions may behave differently.)

Share

One of the problems DBAs encounter when working with very large databases (VLDBs) is an obvious one – lack of disk space. These databases grow over time, seeming to swallow whatever disk space you can throw at them. As they get bigger, it also takes longer to perform basic maintenance tasks. If the database gets big enough, it’s possible that the time it takes to perform the maintenance exceeds the scheduled maintenance window. When that happens, it’s tempting to just stop performing maintenance, especially if you have never encountered any problems with the database in the past. This is a recipe for disaster.

As I went through all the databases here at my new company, I discovered a VLDB that was about 600 GB. This particular server is running on local disk drives, not SANs, and there are no more drive bays available to add disk space. The former DBA here, in an effort to stop running out of space, implemented a job to shrink the database weekly. That temporarily solved the disk space issue, but created others. Notably, performance dropped. Not only does shrinking the database completely fragment the indexes, it creates poor performance because every time the database needs to grow, it has to zero out about 60 GB of disk space. (The autogrowth setting was set to 10% of the file size.)

Did I also mention there was no maintenance being performed (other than the shrink)?

This database was a mess. The first thing I did was kill the shrink job. I then changed the autogrowth settings to something more reasonable – a fixed 500 MB. This particular database didn’t seem to be growing as much as I initially thought it would, so I started looking at getting some index maintenance going to fix all the fragmentation I was sure was present. As I dove into this more deeply, I discovered this database had over 65,000 tables with each table having about 2 to 5 indexes. Checking them, I did indeed find near 99% fragmentation in almost all of indexes. I also found something else – the Page Fullness of many of the indexes was between 8% and 12%! Now, if this database was handling thousands of DML statements daily, this might be an appropriate value. But after looking at the server load, it was clear to me this was not the case. There was no need for the index pages to be that sparsely populated. So as part of my index maintenance, I also rebuilt the indexes with a fill factor of 90%. (It also helped that this was on an Enterprise edition of SLQ Server, so on-line index rebuilds were supported and I was therefore not so chained to my maintenance window.)

Turns out, this simple maintenance solved two problems at once. Not only did performance of the database improve due to the index defrags, but increasing the page density of the indexes freed up a ton of space, helping to curb the need for additional drive space. Before I started my maintenance, the database was 605 GB in size with 75 GB of free space. After maintenance was complete, the database was  still 605 GB in size but it now had 102 GB of free space. (Now granted, the log file backups and differential backup sizes increased during the few days I was performing my maintenance – all those index rebuilds are logged after all -  but those are temporary increases that will drop off the hard disks as the backup files age.)  This free space is internal to the database files, so the disk drive available space hasn’t changed, but at least now the database shouldn’t need to grow for a while.

So keep in mind that routine maintenance not only helps performance, but can also help manage disk space as well. Do not neglect basic maintenance just because it takes a long time. That will only cause you more trouble down the road!

Share

SSMS Trick

3 comments

Here's an SSMS trick I discovered by accident the other day.

When you try to run a query and get a syntax error, you can double click on the error message in the message pane and the offending statement will be highlighted in the upper pane:

 

This works in the 2005 and 2008 versions of SSMS. Heck, this even works in 2000's Query Analyzer.. Guess I've been missing out for several years!

Share