“Drive Space Monitoring Gets An Update” Update

A while back, I wrote an article that appeared on SQLServerCentral.com that presented a routine to gather disk usage data for all the SQL Servers in an organization. One of the pieces of data that routine gathered was the size of the SQL Server data and log files and the total size of the drives they resided on. This was useful information, but if you wanted to calculate the percentage of total drive space used, it could possibly be inaccurate. If programs other than SQL Server used that drive, they could eat up all the disk space and this routine would not detect it. This made the nice little gauges I made for my report somewhat misleading.

I’ve revisited this routine and have added code to collect the necessary data to determine the total disk used amount. In retrospect, this was remarkably easy to do and I should have included this in the first pass. In my defense though, this was my first Powershell project and I was (and am) still learning.

I will assume you have already read my article on SQL Server Central and have used the scripts there to create the tables, logins, etc.

The additional data I will collect is the drive free space. To store this, we need to add a column to the DatabaseDiskStats table:

ALTER TABLE DatabaseDiskStats
  ADD DriveFreeSpace_in_MB int null

Now we need to modify the routine to collect this data. This is done by adding some code to the Powershell script. I created a new variable in the script called $DriveFreeSpace. I convert this number from the bytes it is reported in to megabytes and then write it to the table at the same time the MaxDriveSize_in_MB value is written. The entire modified script is:

# ---------------------------------------------------------------------------
### Author:
### Shaun Stuart
### www.shaunjstuart.com
# ---------------------------------------------------------------------------

$DiskStatsServer = '<your servername here>'
$DiskStatsDB = '<your db name here>'
$SelectQuery = "SELECT DISTINCT server_name, Drive_letter FROM DatabaseDiskStats WHERE MaxDriveSize_in_MB is NULL"

#######################
## MAIN             ##
#######################

$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=$DiskStatsServer;Database=$DiskStatsDB;Integrated Security=True"

# Can't issue the update command via the same connection as the SELECT because of the DataReader loop, so we need a new connection

$SqlUpdConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlUpdConnection.ConnectionString = "Server=$DiskStatsServer;Database=$DiskStatsDB;Integrated Security=True"

$SqlConnection.open()
$SQLUpdConnection.open()

$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = $SelectQuery
$SqlCmd.Connection = $SqlConnection

$SqlUpdCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlUpdCmd.Connection = $SqlUpdConnection

$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd

$ds = $SqlCmd.ExecuteReader()
while ($ds.Read()) {
	$Servername = $ds.GetValue(0)
	$DriveLetter = $ds.GetValue(1)

	$Driveobject=gwmi  win32_logicaldisk -computername $servername -filter "DeviceID='$DriveLetter'"
	$DriveSize=$Driveobject.size
	$DriveSize=[int64]$DriveSize/[int64]1024/[int64]1000
	$DriveSize = [math]::Round($DriveSize,0)
	$DriveFreeSpace=$Driveobject.freespace
	$DriveFreeSpace=[int64]$DriveFreeSpace/[int64]1024/[int64]1000
	$DriveFreeSpace = [math]::Round($DriveFreeSpace,0)
	echo "The server $Servername drive $DriveLetter is $DriveSize MB"
	echo "The server $Servername drive $DriveLetter has $DriveFreeSpace MB free"

	$UpdateQuery = @"
	UPDATE DatabaseDiskStats
	SET MaxDriveSize_in_MB = '$DriveSize'
	    ,DriveFreeSpace_in_MB = '$DriveFreeSpace'
	WHERE Server_name = '$ServerName'
		AND Drive_letter = '$DriveLetter'
		AND MaxDriveSize_in_MB is NULL
"@

	$SqlUpdCmd.CommandText = $UpdateQuery
	$SqlUpdCmd.ExecuteNonQuery()

}

$ds.Close()
$SqlConnection.Close()
$SqlUpdConnection.Close()

Replace your existing DBDiskSpaceAnalysis.ps1 file with the above code and save it (after you’ve filled in your server name and database name in the first couple of lines). Now the routine will start collecting how much free space is left on the disks.

The final step is to incorporate this new data into the report and change the gauges to use it. I decided to update the gauges on my report by adding an additional pointer. The needle pointer still shows the percentage of the disk drive that the SQL files are taking up. But I have added a bar pointer that goes around the outside of the gauge to indicate how full the drive is overall:

Click to enlarge

The above image shows gauges for two of my servers The top row of gauges shows SQL Server files pretty much make up all the data on the drives for one server (recall, in my report, the different pointer colors indicate different drives). The lower row shows that on the other server, the data drive is being shared with something else – the drive is about 15% full but SQL Server files only account for about 1 or 2 percent of that.

In the comments on my initial post about this report, I gave the query I used as the data source for the gauges in my report. Here is the updated version:

SELECT	server_name,
		Drive_letter,
		SUM(size_in_mb) as DBSize_in_MB,
		statDate,
		MAX(maxdrivesize_in_mb) as TotalDriveSize_in_MB,
		MAX(drivefreespace_in_mb) as TotalFreeSpace_in_MB,
		(SUM(CAST(size_in_mb as decimal (9,2)))/MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))) * 100 as SQLPercentFull,
		((MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))- MAX(CAST(drivefreespace_in_mb as decimal(9,2))))/MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))) * 100 as TotalPercentFull
FROM DatabaseDiskStats
WHERE statDate = CONVERT([date],GETDATE(),(0))
	AND MaxDriveSize_in_MB <>0
GROUP BY	Server_name,
			Drive_Letter,
			StatDate
ORDER BY drive_letter

Recall, there is a filter on each gauge so that it only looks at the server and drive it is representing.

Now the report is a little more useful.

Free Added Bonus!

As a bonus for reading this far, here’s an additional stored procedure, provide at no cost to you! While I was creating this report, I realized I could use this new bit of data to send out an alert when a disk reaches a certain percentage full. I know, I get the report emailed every day which includes this data, but it is precisely because I get it everyday that I want an additional alert. Human nature being what it is, I’m sure I will eventually start to just casually glance at the daily report. It is entirely possible that I could miss seeing a drive that is approaching being full.

To help me catch these cases, I made the below stored procedure. It takes two input parameters, although defaults are used, so you don’t even have to supply those. The first parameter, @WarningThreshold, is the percentage full a drive has to be at or above in order for an alert to be sent out. The second parameter, @AlertEmail, is the email address an alert will be sent to. Of course, this procedure assumes you have database mail enabled.

CREATE PROCEDURE SQLDriveSpaceWarningEmail
	@WarningThreshold	decimal(4,2) = '90',
	@AlertEmail varchar(200) = '<enter your default email address here>'
AS

DECLARE	@AlertSubject varchar(200)
DECLARE	@AlertBody varchar(3000)

DECLARE	@Server_name varchar(100)
DECLARE @Drive_letter char(2)
DECLARE	@SQLPercentFull decimal(4,2)
DECLARE	@TotalPercentFull decimal(4,2)

SET	@AlertSubject = 'ALERT!!! SQL DRIVE SPACE WARNING ON SERVER '

DECLARE	OverThresholdCursor CURSOR
FOR
	SELECT	server_name,
			Drive_letter,
			(SUM(CAST(size_in_mb as decimal (9,2)))/MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))) * 100 as SQLPercentFull,
			((MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))- MAX(CAST(drivefreespace_in_mb as decimal(9,2))))/MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))) * 100 as TotalPercentFull
	FROM DatabaseDiskStats
	WHERE statDate = CONVERT([date],GETDATE(),(0))
		AND MaxDriveSize_in_MB <>0
	GROUP BY	Server_name,
				Drive_Letter,
				StatDate
	HAVING	(((MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))- MAX(CAST(drivefreespace_in_mb as decimal(9,2))))/MAX(CAST(maxdrivesize_in_mb as decimal(9,2)))) * 100) >= @WarningThreshold
	ORDER BY server_name,drive_letter

OPEN OverThresholdCursor

FETCH NEXT FROM OverThresholdCursor INTO @Server_name, @Drive_letter, @SQLPercentFull, @TotalPercentFull

WHILE @@fetch_status = 0
BEGIN
	SET @AlertSubject = @AlertSubject + UPPER(@Server_name) + '!!!'
	SET	@AlertBody = 'Drive ' + @Drive_letter + ' on SQL Server ' + @Server_name + ' is ' + cast(@TotalPercentFull as varchar(5))
					+ '% full. SQL Server files are using ' + cast(@SQLPercentFull as varchar(5)) + '% of the disk.'
	exec msdb..sp_send_dbmail	@recipients = @AlertEmail,
				@body  = @AlertBody,
				@subject = @AlertSubject

	FETCH NEXT FROM OverThresholdCursor INTO @Server_name, @Drive_letter, @SQLPercentFull, @TotalPercentFull

END

CLOSE	OverThresholdCursor
DEALLOCATE	OverThresholdCursor

I added the execution of this stored procedure as the last step of the job that collects the data.

You all should have alerts set up for severity 17 errors (the severity which includes out of disk space errors). But by the time you get that alert, your drives are already full and SQL Server will likely have come to a halt. Using this routine, you can be notified before a drive is filled so you can take the steps needed to provide more space before the problem escalates into an emergency.

UPDATE: I found a bug in the stored procedure of this routine that may result in the @@SERVERNAME system variable getting set to NULL after a restart. More details and a fix here.

The Mirror Man

I’ve already admitted I’m a fan of pop music. In particular, Eighties pop music holds a dear spot in my heart. I’ve spent the last week working with database mirroring and I’ve had this Human League song going through my head the whole time. Ah, the 80s.. Such music..

Moving along… Last week, I wrote about setting up database mirroring. This week, I’m going to talk about alerts you can use in monitoring your mirroring topologies and how failovers affect clients accessing the database. Let’s start with the latter.

Client Access

If you have used clustering as a high availability solution previously, you are aware of what happens when a clustering failover occurs: The SQL Server service is switched from running on one machine to the other (assuming an active / passive cluster). Behind the scenes, Windows also handles redirecting incoming requests to the cluster name and / or IP  address  from the original machine to the newly active machine. From a client point-of-view, there is no change. The client still tries to connect to the same IP address or same server name (which would be the name of the cluster).

But in database mirroring, things are a bit different. Only a database fails over and the original SQL Server may still be up and running and hosting other databases. The mirrored database is suddenly active on a new SQL Server with a new server name and IP address. So how does the client know to connect to the mirror instance (which has become the new principal) instead of the old principal (where the database has gone offline for some reason, triggering the failover)? If you are using older ODBC drivers that don’t support mirroring, it doesn’t. You’ll have to manually reconfigure the ODBC driver or change the database connection properties in your program to point to the new server. This is obviously not ideal. If the goal of mirroring is to reduce downtime, having to manually reconfigure all your clients after a failover does the opposite – it creates downtime!

Luckily, Microsoft has released an ODBC driver that supports mirroring failovers. Go to this site and look for the paragraph titled Microsoft SQL Server 2008 Native Client. There are versions there for 32 bit, 64 bit, and IA64 processors. In addition to the ODBC driver, this download also contains OLE DB providers and support for ADO. This driver allows your client to access the new features of SQL 2008, such as mirroring.

Once you install this package, when you go to add a new ODBC source, you’ll see a new option when you add a System DSN: SQL Server Native Client 10.0.

If you select this and go through the standard configuration steps, you’ll notice a new screen:

This is where you enter the name of the mirror server (and, optionally, it’s SPN). Now, when the database fails over, the client will know where to go to connect to the mirrored database.

Be sure to check the System Requirements (listed at the top of the download page) before trying to install and use this new driver.

Monitoring Mirroring

I want to be notified when a mirroring failover occurs. If the failover was due to something internal to SQL Server, such as a media failure, it is likely you will get an alert for this because this is a severity 24 error (and you do have alerts set up for severity 17 through 25 errors, don’t you?). But what if SQL Server is running fine and it was a network outage that caused the mirroring failover? That won’t generate a SQL Server alert. So I want to be notified anytime a failover occurs.

This page, which I linked to last week, describes how to set up notifications for SQL mirroring events. The article is about how to initiate a failover of other databases via a job when one fails over (in case your client application uses multiple databases), but the alert concept works just as well for sending an email.

In this case, we set up an alert for a Windows WMI event. Database mirroring has several states you can monitor for. See the link for a list of them. In this case, I am looking for failovers – both manual and automatic failovers. This corresponds to states 7 and 8. I also want a specific alert for each database that is being mirrored. You’ll see why in a minute. So I set up my alert with a condition like this

Click to enlarge

You see in the Query field, I specify the database name this alert monitors. (Don’t worry about the Namespace field – it defaults to the current instance you are working with, which will be fine). Click the Response page and choose to have an email sent when the alert is triggered. I then like to go to the Options page and include some additional text to be included with the alert:

Click to enlarge

This is why I like a separate alert for each database – I configure my alerts to include the database name that is affected, as well as which server it was on and which server it failed over to. As I’ve mentioned before, I like my alert emails to provide me with as much information as I need to determine exactly what happened.

The other alert I set up is to watch for a paused mirroring session. You can pausing mirroring if, for example, you are doing something that you know will generate a ton of transaction log entries – rebuilding all the indexes perhaps. If you have a slow network connection, you could experience performance issues on the principal server as it waits for the transactions to be sent to the mirror server (assuming you are in High Availability mode – High Performance mode does not have this issue). To get around this, you can pause the mirroring session, which lets the principal server run at full speed. When the session is un-paused, the transaction log entries then get sent to the mirror. However, in high availability mode, transactions in the transaction log remain active and cannot be cleared by a checkpoint until they are sent to the mirror database. Therefore, the transaction log cannot be shrunk, even if it has been backed up, and you can potentially run out of disk space if a mirroring session remains paused for too long. See this Microsoft article for more details on this situation. So I want to be notified anytime a mirroring session is paused so I can make sure to un-paused it quickly.

This alert is set up that same as the previous alert, except this time you will be monitoring for “State = 9” in your Query statement and my additional text to include on the Options page reads “Database mirroring of the SJSTest database on <servername> has been paused.”

We’ve just defined an alert that tells us when the mirroring session has been paused. In real life, I’ve seen mirroring automatically get switched into the paused state during periods of high disk I/O, like during the aforementioned maintenance job that is rebuilding indexes on the mirrored database. In most cases, as soon as the job is over and the disk I/O drops off, mirroring can be resumed with no problem. Now suppose I’m driving in my car on the way to the local Indian casino to see Rick Springfield perform when my phone dings and I see this alert come in. I know this is about the time a maintenance job runs and that this is what is likely causing the problem. Mirroring should be able to be resumed with no problem in a few minutes. I have a couple of choices. I can turn around and head home, login to the server remotely and resume the mirroring session myself, like a working class dog, or I can call up another member of the IT department and walk them through un-pausing the mirroring session – after all, it’s just a couple of mouse clicks – and continue my evening with Jessie’s girl. Either way, I still want to be sure that mirroring has resumed.

So another alert can be set up. Two actually. Again, the setup is the same as before, but this time we will be monitoring for State = 11 and State = 13. State 11 corresponds to a “synchronizing mirror” state and state 13 corresponds to a “synchronizing principal” state. These are the states the mirrored database will transition to when it is resuming from being paused. (It is also a state they will go through during a failover.) By setting these alerts up, we can get notified when a mirroring session transitions out of the paused state and I will have confirmation that my IT co-worker has successfully followed my instructions. If you want to be very paranoid, you can also create alerts for when the database enters the “synchronized” state, indicating all the transactions have been caught up. However, if you look at the various states in the state chart, you’ll see there are several state IDs for this status, depending on the state of the witness server. That’s just a few too many for me to bother setting up alerts for. I’m content to assume that, once the database enters the “synchronizing” state, it will make it to the “synchronized” state. At least, until experience proves otherwise.

Now we’ve set up all these alerts on the principal server. They also need to be set up on the mirror server – because if a failover occurs, that server will become the new principal. So create all the alerts on that server as well. Be sure to change the text of the messages to reflect the change in server roles (i.e., the “server X has failed over to server Y” part).

I should note that these alerts are not instant. During my (admittedly brief) testing, I noticed it took about 30 seconds after a mirror session had been paused for me to get the alert email and 10 seconds after a manual failover to get that alert email. But this is a minor and acceptable delay.

If you have any questions, leave a comment and I’ll try to answer your every wish. Your every dream, hope, desire.

Mirror, Mirror On The Wall

Last week, I spent some time testing database mirroring as part of our disaster recovery strategy. I have to say, I am very impressed with the concept. It was incredibly easy to set up and worked like a charm. It is so much easier than log shipping or replication. And mirroring failovers are faster than a cluster failover. The drawback is that mirroring works on single databases, not an entire server. You can set up mirroring for more than one database on a server, but if you are using a 32 bit version of SQL, it is recommended that you mirror no more than 10 databases. (Mirroring consumes 5 threads per database mirrored and the number of threads you can have is limited by the number of CPUs you have. Because each thread takes up memory, the larger 64 bit memory address space allows for more threads to be used. This article gives a bit more information. And the number of 10 databases max in 32 bit systems is really just a suggestion. If you need to mirror more, try testing performance in your environment. You may find you can handle more. There are some good detailed articles here and here about the factors that affect mirroring performance.)

I’m not going to go into a full explanation of database mirroring. That’s beyond the scope of this post. You can get a good overview from this Microsoft TechNet article. I also recommend reading this Microsoft Knowledge Base article about things to consider when setting up mirroring. (Note that some of these articles, and other ones I link to in this post, refer to SQL 2005. This was the version when mirroring was first released. The details still hold for SQL 2008 and 2008 R2.) I am going to talk about setting up and configuring mirroring.

First, watch this video on setting up mirroring.

It’s really that easy!

My topology was the same as the person who made the above video – a principal server, a mirror server, and a witness server. The configuration supports automatic failover and is referred to as High Availability mode. My setup is: Principal server is 32 bit SQL 2008 R2 CU1 running on a virtual machine. The mirror server is a 64 bit 2008 R2 CU1 two node active/passive cluster. The monitor server is 32 bit SQL 2008 R2 CU1 Express. Note that all versions of SQL are the same on all three machines. This is a best practices recommendation. But also note that the principal server is running the 32 bit version of SQL and the mirror server is running the 64 bit version of SQL. This is ok. Another best practices recommendation is that the directory structure on the principal and mirror servers be the same. I also made sure that the SQL Server service used the same account on all servers.

Once you have your systems set up, actually getting mirroring up and running is easy. Simply make a full backup of the database to be mirrored on the principal server, then make a transaction log backup. Restore those backups onto the mirror server. As mentioned in the video, be sure to leave the database on the mirror server in the restoring state. Then, on the principal server, right click on the database to be mirrored, choose Tasks –> Mirror… and go through the wizard. In the video, the user left the service account fields blank. When I set it up, I actually inserted the network account that was used by the various server services (which was the same account for all). That’s it! When you finish the wizard, you’ll see the database on the principal listed as “(Principal, Synchronized).” On the mirror server, the database will be listed as “(Mirror, Synchronized / Restoring…)” Since it is in the restoring state, it is not accessible on that server.

The video shows the user testing a failover by actually stopping the SQL service on the principal server. That’s rather drastic (although it does simulate a server failure). You can manually failover a mirrored database by right-clicking the database, choosing Tasks –> Mirror… and then clicking the Failover button.

Click to enlarge

One thing I did run into in setting this up was the old orphaned login problem. I restored the database onto the principal server from a backup made on our production server. Then the backup from the principal server was restored on the mirror server. So I had orphaned logins on both servers. I fixed that by using this script. I first ran it on the principal server to fix the logins there. Then, I failed over the database to the mirror server (which I had to do because the database would be in the restoring state otherwise), and ran the script on the mirror server. I verified the logins worked on the mirror server, failed the database over again and then verified the logins worked on the original principal server.

So that was all it took to get mirroring up and running. Fairly easy and much easier than log shipping.

Things To Watch For

Time to point out a few more things to watch out for.

  • Database mirroring works at the database level. If your application uses just one database, this is fine. But if it uses multiple databases, you have to watch out for the situation where not all the databases fail over. In this case, you can set an alert for a database failover and then manually failover the other databases via a job that runs when the alert is raised. Details on setting this up can be found here.
  • Another thing to watch out for is combining mirroring with clustering. This is allowable, but you need to be aware of one fact: the default timeout for the monitor server to detect a database failure and initiate a mirroring failover is 10 seconds. It typically takes longer than 10 seconds for a cluster to failover from one node to another. Therefore, if your principal server is a cluster, you should change the mirror monitor timeout to a larger value to prevent false failovers. Read Paul Randal’s blog post on this. Keep in mind that by increasing the timeout value, you are increasing the time the database may be inaccessible before it fails over. You’ll need to balance this timeout setting with your business requirements. In my case, the cluster server is the mirror server and I’m not too concerned about this timing issue. Our mirror server (cluster) is located in a different physical location and, should a failover occur, we will have a slower network connection for the users. We will still have functionality, but it will just be slower than normal. This is a business decision we had made, but what it means is, should a failover occur, we will most likely be manually failing from the cluster back to the previous principal as soon as possible, so we are not too concerned about an automatic failover in this direction.
  • Consider your network bandwidth. You want to have a good connection between your two servers and your monitoring server to prevent false failovers. The monitor server will ping the principal and mirror server one each second. If you are running in High Availability mode, you’ll also want a good network connection between the principal and the mirror server. Because mirroring is based on synchronizing the transaction log between the principal and mirror server, if that cannot be done, the transaction log on the principal cannot be truncated and will continue to grow. The Knowledge Base article I linked to in the second paragraph gives some figures and details which Perfmon counters to monitor. SQL will compress the transaction log as it sends it to the mirror server, so that will help improve throughput. I had previously found a site that did some real world testing of the amount of data that could be sent over various networks before mirroring had problems, but I can’t find it again – sorry. In my situation, we are dealing with traffic volume an order of magnitude less than what the testers were dealing with, so I wasn’t too concerned with this.

My next post will talk about monitoring for mirroring events and why you should do that, as well as what you need to do to support mirroring on the client side of things.

Clean Event Logs

I like to keep my Windows Event Logs clean. To me, the event log should be a place to go to find errors. In fact, the only time I look at the event log is when I’m seeing errors with the system. So when programs write informational messages to the event log, I try to find some way to turn them off to prevent clutter. For instance, by default, SQL Server will write a message to the Windows Application event log every time a successful backup completes. Enabling trace flag 3226 will stop these messages.

I’ve discovered another SQL process that writes unnecessary (in my opinion) messages to the Windows application log: SSIS packages. These include maintenance plans, so even if you have enabled trace flag 3226 to stop backup messages being written to the log, if your backups are done via a maintenance plan, you will still see entries like the following:

Unfortunately, there is not currently a trace flag to disable these messages. There is, however, a registry hack.

As usual, modifying the registry can be a dangerous thing. Make a backup of your registry before modifying it. Only proceed if you know what you are doing.

That said, you can disable the SSIS message by modifying the key HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\100\SSIS\LogPackageExecutionToEventLog to have a value of 0. A restart is not required for the change to take effect.

I should also note that we really aren’t losing any information here. The execution of the SSIS package will still be logged by SQL Server in the job history, if it is a scheduled task that runs. Any one-off SSIS executions will not be logged however, so if tracking those is important to you, do not make this change.