# Availability Monitoring – Part 3

This is the final almost final part of my series of articles about creating a process to monitor your SQL Server for downtime. Part 1 can be found here. Part 2 can be found here.

### Reporting

Now that we’ve got a process in place to gather data, we need to create a way to report off of that data. We designed our data recording process to be as lightweight as possible at the expense of adding possible complexity on the reporting side. Now is when we have to pay that bill.

Let me reprint the image representing our sample data:

This is a visual representation of the data in the following table:

 LogDate UptimeMinutes 2013-08-21 12:00:00 100 2013-08-21 14:15:00 115 2013-08-22 08:15:00 1070 2013-09-03 08:45:00 17300

Again, for the sake of our discussion, let’s assume the current time is 9/3/13 8:46 AM and the server is still running.

The blue bars represent periods where our SQL Server is up and running. The times between are times the server is down. We want to query the data and find out, for a given date range, what was the availability percentage of the server.  Calculating this can be broken down into 4 possible scenarios:

1. Both the start and end times of the reporting window fall within a period of the server being up
2. The report start time falls within a server downtime period and the report end time falls in a server uptime period
3. The report start time falls within a server uptime period and the report end time falls in a server downtime period
4. Both the start and end times of the reporting window fall within a server downtime period.

Why do we need to break things out this way? For a couple reasons. First, as we demonstrated last time, in order to calculate how accurate our monitoring data is we need to know how many up/down periods occurred within the reporting interval. Second, if the report interval falls completely within one type of period (either up or down), the calculations are greatly simplified.

I’ll walk through the calculation process for each scenarios before presenting the code.

#### Scenario 1

In this scenario, both the start and end times of the reporting interval fall within a period where the server was up. The generic situation looks like this:

The red circle represents the start of the reporting period and the red rectangle isn the end of the reporting period. There could be more than 1 uptime period in the reporting interval, although I only show one here. The total minutes the system has been available between these two points can be found by adding:

• The total uptime of all complete uptime periods completely within the report window (8/21 between 12:20 and 14:14, or 115 minutes, in this case) and
• The uptime from the start of the reporting interval to the end of the uptime period that falls in (from the red circle to 8/21 12:00, in this case) and
• The uptime from the start of the uptime period the report end time is in to the report end time (8/21 14:25 to the red rectangle, in this case)

These calculations require us to determine which time period the report endpoints fall in. In doing this, we can determine if we are in following simplified case:

Here, the total uptime is simply the difference between the report start and end dates.

Our calculations for our margin of error, also require us to count the total number of uptime periods the report covers.

#### Scenario 2

In this scenario, the start time of the reporting interval falls during a server downtime period and the end time falls within a period where the server was up. The generic situation looks like this:

In this case, the total uptime can be found by adding:

• The total uptime of all uptime periods completely within the reporting interval and
• The uptime from the start of the uptime interval the report end time falls in to that endtime.

#### Scenario 3

In this scenario, the start time of the reporting interval falls during a server uptime period and the end time falls within a period where the server was down. The generic situation looks like this:

In this case, the total uptime can be found by adding:

• The total uptime of all uptime periods completely within the reporting interval and
• The uptime from the start of the report interval to the end of the uptime period the report start time fall in.

#### Scenario 4

In this final scenario, both the start and end times of the reporting interval fall within a period where the server was down. The generic situation looks like this:

In this case, the total uptime can be found by adding:

• The total uptime of all uptime periods completely within the reporting interval

Like the first scenario, there is a simplified case here, which looks like this:

Obviously, in this case, the total uptime will be zero.

That is all fairly straightforward. The tricky part comes where we try to translate those statements into T-SQL code. An extra degree of difficultly is added because our log table does not store the start time of each period. It only stores the end time and the length of the period.

Here is the code to calculate our availability percentage. The output of this procedure will be four columns containing the following text and information:

Uptime Minutes: x

Margin Of Error (Minutes): x

Uptime Percentage for server <servername>: xx.xx%

Uptime Percentage (including margin of error for <servername>: xx.xx%

The comments in the code should be helpful in explaining what is going on. The code takes two input parameters – the start and end times of the desired reporting interval. If these are not supplied, the code reports over the entire timeframe represented in the table.

The variable @MarginOfErrorMultiplier should be set to 2 unless you also have the procedure run on SQL Server startup, in which case it should be set to 1. I talked about this in the Margin Of Error section in part 2.  Also related to the margin of error calculation is what I refer to as a “margin of error modifier” in the code. Recall that our margin of error comes into play every time the server goes up or down. So whenever our report interval crosses an up/down or down/up transition, we can encounter some error. I’ve defined “one margin of error unit” to be one complete cycle – up/down and then down /up (or vice-versa). Therefore, for scenarios 2 and 3 where the report interval includes incomplete cycles, we have to add in a half of a margin of error unit. And for the special cases of scenarios 1 and 4, there are no up/down or down/up transitions included, so there is no error at all.

You’ll need to modify the code to set the @ProcedureRunTimeIntervalInMinutes variable to be however often you run the logging procedure. Here, it is set to 5 minutes.

```CREATE PROCEDURE [dbo].[up_CalculateServerUpTime]
(
@ReportStartDateTime SMALLDATETIME = NULL
,@ReportEndDateTime SMALLDATETIME = NULL
)
AS

/* shaunjstuart.com */

DECLARE @ReportStartDateTimeInUptimeWindow BIT
DECLARE @ReportEndDateTimeInUptimeWindow BIT
DECLARE @NextWindowStartDateTimeReportStart SMALLDATETIME
DECLARE @NextWindowStartDateTimeReportEnd SMALLDATETIME
DECLARE @NextWindowUptimeMinutes INT
DECLARE @TotalReportPeriodUptimeMinutes INT
DECLARE @UptimePercentage DECIMAL(10, 3)
DECLARE @UptimePercentageWithError DECIMAL(10, 3)
DECLARE @MarginOfErrorMultiplier TINYINT = 2
/* This should be 1 if this procedure is set to
run on server startup, 2 otherwise */
DECLARE @IntervalCount DECIMAL(3, 1)
/*	Number of up/down intervals in reporting period. Used for
margin of error calculation */

DECLARE @ProcedureRunTimeIntervalInMinutes SMALLINT
/*	Set this to be how often this procedure
is run. Used in margin of error calculation */

/* If no parameters are passed in, assume the oldest and most recent values for use */

IF @ReportStartDateTime IS NULL
SELECT    @ReportStartDateTime = DATEADD(mi, -UptimeMinutes, LogDate)
FROM      UpTimeTracking
WHERE     logDate = (SELECT   MIN(logDate)
FROM     UptimeTracking
)
IF @ReportEndDateTime IS NULL
SELECT    @ReportEndDateTime = MAX(LogDate)
FROM      UpTimeTracking

/* Validate inputs and Initialize Variables */

IF @ReportStartDateTime >= @ReportEndDateTime
BEGIN
SELECT  'Start time must be before end time'
RETURN
END

IF @ReportStartDateTime < (SELECT    DATEADD(mi, -UptimeMinutes,
LogDate)
FROM      UpTimeTracking
WHERE     LogDate = (SELECT MIN(LogDate)
FROM   UpTimeTracking
)
)
BEGIN
SELECT  'Start time is before logging started.'
RETURN
END

IF @ReportEndDateTime > (SELECT  MAX(LogDate)
FROM    UpTimeTracking
)
BEGIN
SELECT  'Requested report end time is beyond the last logging period. No data is available.'
RETURN
END

SET @IntervalCount = 0.0
SET @ProcedureRunTimeIntervalInMinutes = 5

IF @ProcedureRunTimeIntervalInMinutes IS NULL
BEGIN
SELECT  'Must define how often logging occurs.'
RETURN
END

IF @ProcedureRunTimeIntervalInMinutes <= 0
BEGIN
SELECT  '@ProcedureRunTimeIntervalInMinutes must be greater than zero.'
RETURN
END

/* Check where ReportStartDateTime falls */

SELECT TOP 1
@NextWindowStartDateTimeReportStart = LogDate
,@NextWindowUptimeMinutes = UptimeMinutes
FROM     UpTimeTracking
WHERE    LogDate >= @ReportStartDateTime
ORDER BY LogDate ASC

@NextWindowStartDateTimeReportStart) <= @ReportStartDateTime
SET @ReportStartDateTimeInUptimeWindow = 1
ELSE
SET @ReportStartDateTimeInUptimeWindow = 0

/* Check where ReportEndDateTime falls */

SELECT TOP 1
@NextWindowStartDateTimeReportEnd = LogDate
,@NextWindowUptimeMinutes = UptimeMinutes
FROM     UpTimeTracking
WHERE    LogDate >= @ReportEndDateTime
ORDER BY LogDate ASC

@NextWindowStartDateTimeReportEnd) <= @ReportEndDateTime
SET @ReportEndDateTimeInUptimeWindow = 1
ELSE
SET @ReportEndDateTimeInUptimeWindow = 0

/* Now perform uptime calculations based on where the report
boundaries lie:
ReportStartDatetime during up period and ReportEndDateTime during up period
ReportStartDatetime during up period and ReportEndDateTime during down period
ReportStartDatetime during down period and ReportEndDateTime during up period
ReportStartDatetime during down period and ReportEndDateTime during down period
*/

/* ReportStartDatetime during up period and ReportEndDateTime during up period */

IF (@ReportStartDateTimeInUptimeWindow = 1
AND @ReportEndDateTimeInUptimeWindow = 1)
BEGIN

/*  SELECT 'Section: Up Up'  */
/* First, get total uptime of all periods completely within the report time frame */

SELECT  @TotalReportPeriodUptimeMinutes = ISNULL(SUM(UptimeMinutes),
0)
FROM    UpTimeTracking
WHERE   LogDate < @ReportEndDateTime
AND DATEADD(mi, -UptimeMinutes, LogDate) > @ReportStartDateTime

/* SELECT @TotalReportPeriodUptimeMinutes  */

/* Now add in minutes from the starting time period the report start date falls in */

SELECT TOP 1
@TotalReportPeriodUptimeMinutes = @TotalReportPeriodUptimeMinutes
+ DATEDIFF(mi, @ReportStartDateTime, LogDate)
FROM    UpTimeTracking
WHERE   LogDate >= @ReportStartDateTime
ORDER BY LogDate ASC

/*   SELECT @TotalReportPeriodUptimeMinutes   */

/* Now add in minutes from the end time period the report end date falls in */

SELECT TOP 1
@TotalReportPeriodUptimeMinutes = @TotalReportPeriodUptimeMinutes
+ (UptimeMinutes - DATEDIFF(mi, @ReportEndDateTime,
LogDate))
FROM    UpTimeTracking
WHERE   LogDate >= @ReportEndDateTime
ORDER BY LogDate ASC

/* SELECT @TotalReportPeriodUptimeMinutes   */

/* Calculate the margin of error. This is the number of up/down intervals in our reporting
period plus some adjustments based on if the period start or ends in an up or down
time segment. Here, we know both are in an uptime period, so we add one period as the
modifier. */

SELECT  @IntervalCount = ISNULL(COUNT(1), 0)
FROM    UpTimeTracking
WHERE   LogDate <= @ReportEndDateTime
AND DATEADD(mi, -UptimeMinutes, LogDate) >= @ReportStartDateTime
AND LogDate > (SELECT   MIN(LogDate)
FROM     UpTimeTracking
)

SELECT  @IntervalCount = @IntervalCount + 1.0	/* This is our modifier */

/* Now some special handling for the case where both report start and end dates are in the
same log period. This is a simple case where the uptime is just the difference
between the report dates. */

IF @NextWindowStartDateTimeReportStart = @NextWindowStartDateTimeReportEnd
BEGIN
SELECT @TotalReportPeriodUptimeMinutes = DATEDIFF(mi,
@ReportStartDateTime,
@ReportEndDateTime)
/* SELECT    'Both are in same log period period' */

/* This situation also means the margin of error is calculated differently. Because
we are completely within one log period, there is no margin of error to worry about. */

SELECT @IntervalCount = 0.0

END

END

/* ReportStartDatetime during up period and ReportEndDateTime during down period */

IF (@ReportStartDateTimeInUptimeWindow = 1
AND @ReportEndDateTimeInUptimeWindow = 0)
BEGIN
/* SELECT 'Section: Up Down' */

/* First, get total uptime of all periods completely within the report time frame */

SELECT  @TotalReportPeriodUptimeMinutes = ISNULL(SUM(UptimeMinutes),
0)
FROM    UpTimeTracking
WHERE   LogDate BETWEEN @ReportStartDateTime
AND     @ReportEndDateTime
AND LogDate > (SELECT   MIN(LogDate)
FROM     UpTimeTracking
)
/* Need the AND clause to handle case where ReportStartDate is in the first
log period in the table. Otherwise, it gets counted twice */

/* Now add in minutes from the starting time period the report start date falls in */

SELECT TOP 1
@TotalReportPeriodUptimeMinutes = @TotalReportPeriodUptimeMinutes
+ (DATEDIFF(mi, @ReportStartDateTime, LogDate))
FROM    UpTimeTracking
WHERE   LogDate >= @ReportStartDateTime
ORDER BY LogDate ASC

/* Now add in minutes from the end time period the report end date falls in */

/* Oops! We already know the report end date does not fall in an up period, so there
is nothing to add here. Move along. */

/* Calculate the margin of error. This is the number of up/down intervals in our reporting
period plus some adjustments based on if the period start or ends in an up or down
time segment. Here, we know one is in an uptime period and one isn't, so we add one-half
of one period as the modifier. */

SELECT  @IntervalCount = ISNULL(COUNT(1), 0)
FROM    UpTimeTracking
WHERE   LogDate BETWEEN @ReportStartDateTime
AND     @ReportEndDateTime
AND LogDate > (SELECT   MIN(LogDate)
FROM     UpTimeTracking
)
SELECT  @IntervalCount = @IntervalCount + 0.5	/* This is our modifier */

END

/* ReportStartDatetime during down period and ReportEndDateTime during up period */

IF (@ReportStartDateTimeInUptimeWindow = 0
AND @ReportEndDateTimeInUptimeWindow = 1)
BEGIN
/* SELECT 'Section: Down Up'*/

/* First, get total uptime of all periods completely within the report time frame */

SELECT  @TotalReportPeriodUptimeMinutes = ISNULL(SUM(UptimeMinutes),
0)
FROM    UpTimeTracking
WHERE   LogDate BETWEEN @ReportStartDateTime
AND     @ReportEndDateTime
/* Now add in minutes from the starting time period the report start date falls in */

/* Oops! We already know the report start date does not fall in an up period, so there
is nothing to add here. Move along. */

/* Now add in minutes from the end time period the report end date falls in */

SELECT TOP 1
@TotalReportPeriodUptimeMinutes = @TotalReportPeriodUptimeMinutes
+ (UptimeMinutes - DATEDIFF(mi, @ReportEndDateTime,
LogDate))
FROM    UpTimeTracking
WHERE   LogDate >= @ReportEndDateTime
ORDER BY LogDate ASC

/* Calculate the margin of error. This is the number of up/down intervals in our reporting
period plus some adjustments based on if the period start or ends in an up or down
time segment. Here, we know one is in an uptime period and one isn't, so we add one-half
of one period as the modifier. */

SELECT  @IntervalCount = ISNULL(COUNT(1), 0)
FROM    UpTimeTracking
WHERE   LogDate BETWEEN @ReportStartDateTime
AND     @ReportEndDateTime
AND LogDate > (SELECT   MIN(LogDate)
FROM     UpTimeTracking
)

SELECT  @IntervalCount = @IntervalCount + 0.5	/* This is our modifier */

END

/* ReportStartDatetime during down period and ReportEndDateTime during down period */

IF (@ReportStartDateTimeInUptimeWindow = 0
AND @ReportEndDateTimeInUptimeWindow = 0)
BEGIN
/* SELECT 'Section: Down Down' */

/* First, get total uptime of all periods completely within the report time frame */

SELECT  @TotalReportPeriodUptimeMinutes = ISNULL(SUM(UptimeMinutes),
0)
FROM    UpTimeTracking
WHERE   LogDate BETWEEN @ReportStartDateTime
AND     @ReportEndDateTime

/* Now add in minutes from the starting time period the report start date falls in */

/* Oops! We already know the report start date does not fall in an up period, so there
is nothing to add here. Move along. */

/* Now add in minutes from the end time period the report end date falls in */

/* Oops! We already know the report end date does not fall in an up period, so there
is nothing to add here. Move along. */

/* Calculate the margin of error. This is the number of up/down intervals in our reporting
period plus some adjustments based on if the period start or ends in an up or down
time segment. Here, we know neither are in an uptime period, so we don't add any modifier. */

SELECT  @IntervalCount = ISNULL(COUNT(1), 0)
FROM    UpTimeTracking
WHERE   LogDate BETWEEN @ReportStartDateTime
AND     @ReportEndDateTime
AND LogDate > (SELECT   MIN(LogDate)
FROM     UpTimeTracking
)

SELECT  @IntervalCount = @IntervalCount + 0.0	/* This is our modifier */

END

/* Calculations done. Now output results. Outputting as a single result set for easier
consumption for reports. */

/*Uptime percentage over reporting period = total uptime during reporting period / total time of that period */

SET @UptimePercentage = 100
* (CAST(@TotalReportPeriodUptimeMinutes AS DECIMAL(20, 3))
/ CAST(DATEDIFF(mi, @ReportStartDateTime, @ReportEndDateTime) AS DECIMAL(20,
3)))

SET @UptimePercentageWithError = 100
* (CAST((@TotalReportPeriodUptimeMinutes - (@IntervalCount
* @MarginOfErrorMultiplier
* @ProcedureRunTimeIntervalInMinutes)) AS DECIMAL(20,
3))
/ CAST(DATEDIFF(mi, @ReportStartDateTime, @ReportEndDateTime) AS DECIMAL(20,
3)))

SELECT   'Uptime Minutes: '
+ CAST(@TotalReportPeriodUptimeMinutes AS VARCHAR(20)) AS UpTimeMinutes
,'Margin Of Error (Minutes): '
+ CAST((@IntervalCount * @MarginOfErrorMultiplier
* @ProcedureRunTimeIntervalInMinutes) AS VARCHAR(20)) AS MarginOfErrorMinutes
,'Uptime Percentage for server ' + @@servername + ': '
+ CAST(@UptimePercentage AS VARCHAR(16)) AS UpTimePercentage
,'Uptime Percentage (including margin of error) for server '
+ @@servername + ': '
+ CAST(@UptimePercentageWithError AS VARCHAR(16)) AS UpTimePercentageWithError

```

Note that, in most companies, down time due to planned maintenance does not count against your up time availability percentage. This process does not account for down time due to planned maintenance.

I had planned to end the series here, but I think one more post is in order. This is a pretty complicated reporting method, so I wanted to make sure it worked for all scenarios I could think of. My final post on this will go over my testing methodology.

## One thought on “Availability Monitoring – Part 3”

This site uses Akismet to reduce spam. Learn how your comment data is processed.