Availability Monitoring – Part 4

For the final part of this series on Uptime Monitoring, I’d like to go over my testing methodology. The reporting process is somewhat complicated, so I want to be confident the figures my procedure outputs are correct. Here again, is the sample data I am working with:

LogDate	UptimeMinutes
2013-08-21 12:00:00	100
2013-08-21 14:15:00	115
2013-08-22 08:15:00	1070
2013-09-03 08:45:00	17300

Again, for the sake of our discussion, let’s assume the current time is 9/3/13 8:46 AM and the server is still running.

The blue bars represent periods where our SQL Server is up and running. The times between are times the server is down.

To fully verify this procedure, I want to run it using various reporting start and end times in order to test the different scenarios I talked about in Part 3 and compare that to my manually calculated results. Here is the data I used for testing and my manual calculations:

Report Start Report End	Scenario	Uptime (min)	Total Report Time (min)	Availability Percentage	Margin Of Error (min)
8/21 10:25 8/21 10:35	1 (special case)	10	10	100	0
8/21 10:25 8/21 12:15	3	95	110	86.36	5
8/21 12:15 8/21 12:30	2	10	15	66.67	5
8/21 12:15 8/22 8:20	4	1185	1205	98.34	20
8/21 14:20 8/21 14:22	4 (special case)	0	2	0	0
8/22 8:30 8:22 8:45	1 (special case)	15	15	100	0
8/21 12:30 8/21 13:00	1 (special case)	30	30	100	0
8/21 12:30 8/22 8:30	1	1180	1200	98.33	20
8/21 10:25 8/22 8:30	1	1285	1325	96.98	30

The procedure gives the same values for these reporting periods, so I’m pretty confident the code is good. You’ll notice there are a couple of cases where I test a scenario multiple times. This was done to have specific up/down time intervals included in the report – either the first or last, etc. I also ran a test using no report time parameters so the output would result in numbers for all times covered, but that will varying depending on when you run it (because the final period will keep increasing as time goes one), so I left that off the chart.

Final Thoughts

So what did I learn from all this? First, it’s not a trivial task to get a server to monitor itself for down time. Second, I really enjoyed this challenge. Coming up with a data logging procedure was a fun exercise and the calculation portion required some serious thought. I’m sure there are other logging methods that could achieve the same result (for instance, you could probably only log the start and end times of the logging period, instead of the end time and elapsed minutes), but I can’t see one as having a clear advantage over the other. If anyone has other ideas or thoughts, please leave a comment.

Shaun J Stuart

Just another SQL Server weblog

Availability Monitoring – Part 4

Final Thoughts

Leave a Reply Cancel reply