I ran into an interesting case today. We’ve got a pair of SQL Servers running in an Availability Group. We’re set up so we can run with either server as the primary. We use Sentry One to monitor our SQL servers and after a recent planned failover, I noticed that when we run on Server 1, CPU processor utilization seems to be lower than when we are running on Server 2. Not drastically lower and there is not enough difference that our users would complain about it, but when I looked at the graphs, I could definitely see a difference.
This bothered me. The two servers have identical hardware and SQL Server is configured identically on both. I ran through all the checks I could think of: power management was set to High Performance on both, NIC drivers were the same version, both were using the same SAN and local drives for storage, memory was the same, anti-virus settings were identical. I just couldn’t explain why Server 1 was always running with about a 10% lower CPU utilization.
Normally, I like seeing a low CPU utilization. I like my SQL Servers to have some headroom to handle an increase in load. We had been running on Server 1 for a long time in the past and, looking at the CPU utilization, I thought we had lots of room to handle an increased load. But when we were running on Server 2, I wasn’t so sure.
One morning, I decided to dig a little deeper. I had already confirmed Windows power management was set to High Performance on both servers. However, this time, I dug a little deeper and checked out what those settings actually were. Turns out, that was where the difference was.
I discovered that Server 2 was set to a maximum processor state of 100%, but Server 1 was set to 75%. Aha! That was the reason Server 1 always had a lower CPU utilization! Windows was limiting processor use to 75%!
Digging deeper still, I found several other settings that were different: minimum processor state, hard drive shutoff time, PCI Express link state power management, and one or two others.
The moral is just because your CPU utilization looks low, make sure that it isn’t because there is some other rule limiting your processor utilization!
Shaun: Interesting article and root cause analysis. Just when you thought that everything was the same between two systems!
Besides finding the root cause of the CPU usage differences (observed in SentryOne chart form), did you find out HOW these differences occurred? Were the two systems built by different persons with different preferences (that were not documented / scripted in a build document)? Perhaps one system used default power management profile settings, and the other system had custom power management profile settings.
I presume that you may have discovered the “reasons” (possibly unstated) of the “how and “why” the two systems had different power management profile settings.
I am most interested in hearing of the “how and “why” part of your findings on the discovered differences.
I never found out why the differences were there. The machines were built before I joined the company and the IT department had turnover and the person who built them was no longer there. I have seen the build documentation IT uses to configure SQL Server machines, but it only specifies that power management be set to High Performance. It does not contain any instructions for modifying the settings of High Performance mode.. So it’s still a mystery!