For the final part of this series on Uptime Monitoring, I’d like to go over my testing methodology. The reporting process is somewhat complicated, so I want to be confident the figures my procedure outputs are correct. Here again, is the sample data I am working with:
LogDate |
UptimeMinutes |
2013-08-21 12:00:00 | 100 |
2013-08-21 14:15:00 | 115 |
2013-08-22 08:15:00 | 1070 |
2013-09-03 08:45:00 | 17300 |
Again, for the sake of our discussion, let’s assume the current time is 9/3/13 8:46 AM and the server is still running.
The blue bars represent periods where our SQL Server is up and running. The times between are times the server is down.
To fully verify this procedure, I want to run it using various reporting start and end times in order to test the different scenarios I talked about in Part 3 and compare that to my manually calculated results. Here is the data I used for testing and my manual calculations:
Report Start Report End |
Scenario |
Uptime (min) |
Total Report Time (min) |
Availability Percentage |
Margin Of Error (min) |
8/21 10:25
8/21 10:35 |
1 (special case) |
10 |
10 |
100 |
0 |
8/21 10:25
8/21 12:15 |
3 |
95 |
110 |
86.36 |
5 |
8/21 12:15
8/21 12:30 |
2 |
10 |
15 |
66.67 |
5 |
8/21 12:15
8/22 8:20 |
4 |
1185 |
1205 |
98.34 |
20 |
8/21 14:20
8/21 14:22 |
4 (special case) |
0 |
2 |
0 |
0 |
8/22 8:30
8:22 8:45 |
1 (special case) |
15 |
15 |
100 |
0 |
8/21 12:30
8/21 13:00 |
1 (special case) |
30 |
30 |
100 |
0 |
8/21 12:30
8/22 8:30 |
1 |
1180 |
1200 |
98.33 |
20 |
8/21 10:25
8/22 8:30 |
1 |
1285 |
1325 |
96.98 |
30 |
The procedure gives the same values for these reporting periods, so I’m pretty confident the code is good. You’ll notice there are a couple of cases where I test a scenario multiple times. This was done to have specific up/down time intervals included in the report – either the first or last, etc. I also ran a test using no report time parameters so the output would result in numbers for all times covered, but that will varying depending on when you run it (because the final period will keep increasing as time goes one), so I left that off the chart.
Final Thoughts
So what did I learn from all this? First, it’s not a trivial task to get a server to monitor itself for down time. Second, I really enjoyed this challenge. Coming up with a data logging procedure was a fun exercise and the calculation portion required some serious thought. I’m sure there are other logging methods that could achieve the same result (for instance, you could probably only log the start and end times of the logging period, instead of the end time and elapsed minutes), but I can’t see one as having a clear advantage over the other. If anyone has other ideas or thoughts, please leave a comment.