In my job, I don’t have many servers with more than 8 processor cores, so I typically don’t set the server-wide MaxDOP setting and instead let it stay at the default of zero, which tells SQL Server to use however many processors it thinks a query should use.
However, I do have a handful of machines that have 16 cores, so I decided to take a look at those and see if tweaking the MaxDOP setting might improve performance.
What is MaxDOP?
By default, SQL Server will analyze a query and make a decision as to if it thinks the query might be completed faster if it splits the query into smaller pieces and sends each piece to a different CPU to complete. Once all CPUs have completed their work, SQL will combine the results and return the complete result set to the user or calling process. This behavior is called parallelism and in theory, it can provide faster query execution on multi-processor systems. In practice, well… there are always cases were defaults options aren’t the best choice.
As a typical rule, you generally want SQL Server to use lots of cores for queries in data warehousing situations – time is not as critical of a factor here and the queries in these scenarios are usually complex, joining several tables, and looking at large amounts of data. It makes sense for SQL to try to use parallelism here. But if you are operating in an OLTP scenario, speed is the most important factor and you’re going to want SQL to complete the queries as fast as possible. Additionally, in OLTP systems, the usage patterns typically will involve queries returning relatively few rows from a handful of tables, as opposed to the complex joins in data warehouses.
But why wouldn’t you want SQL to use lots of processors in each case? After all, if one processor is good, two or more must be better, right? Not really and there are several reasons for this.
First, SQL Server bases its decision on whether or not to parallelize a query based on the statistics it has for the tables involved. In a data warehouse, the data probably isn’t changing too much, except possibly for new data being added, so the data distribution statistics SQL has are likely pretty accurate. But in an on-line transaction processing system, data is constantly being updated, inserted, and deleted. Statistics in SQL Server are updated asynchronously (by default), so the query optimizer might not have up-to-date statistics and may come up with a non-optimal query plan, which can lead to the next problem.
If the statistics are out of date, the parallel plan SQL Server comes up with might not evenly divide the work among the available processors. This will result in one processor getting much more work to do than the others. Because SQL Server can’t return a result set until all processors have returned their results, you can end up with several processors sitting around doing nothing while they wait for the other processor to finish its work. And you don’t want to have idle processors on your busy production systems, right? Luckily, SQL Server tracks this waiting with the CXPACKET wait type so we can see when and how often this is occurring.
Finally, what happens if you have a 16 processor system and SQL Server decides a query needs all of those processors to complete but your statistics were slightly out of date and you end up with the situation just described – one processor has much more work to do than the others? Fifteen of those processors are going to go idle waiting for the one processor to finish. How many processors does that leave for the other queries on the server waiting to be run? Zero. Your whole system grinds to a halt waiting for this one processor to finish its work.
Enter the MaxDOP setting. MaxDOP stands for Maximum degree of parallelism. This is a global server setting that tells SQL Server the maximum number of processors a query can use. By default, this is zero, which means SQL Server can use as many as it wants, up to as many processors as it has available to it. If you’ve got a handful of processor cores in your server, you typically don’t need to worry about changing the default value. But if you’ve got a beefy server with many cores, you may want to change this.
Microsoft’s recommendation can be found here. In a nutshell, they say if you have 8 or less processor cores, set the value to between zero and the total number of processors. If you have more than 8 cores, set MaxDOP to 8. (There is an additional caveat for NUMA-enabled servers, which I’m not going to talk about but do read the MS article if this applies to you. There are also some gotchas with processor affinity to watch out for.)
As with anything SQL Server-related, this advice always comes with the “It Depends” caveat: these are general guidelines and your situation might require different settings. (And Microsoft also has a different set of best practices for Sharepoint servers. For example, here they specifically say MaxDOP on Sharepoint SQL Servers should be set to 1.) I will also note that these recommendations are for SQL Server 2005 and greater. Parallelism in SQL Server 2000 worked a bit differently. In that version, a query could use 1 processor or all the processors, but nothing in between. So, as always, test thoroughly before making changes to a production environment.
I would also like to point out that you can override the server’s MaxDOP setting on an idividual query by using the MAXDOP query hint.
Next week I will discuss monitoring the CXPACKET wait type to see if you are affected by a bad MaxDOP setting, how to actually change the MaxDOP setting, and give the results I obtained by changing the setting on one of my production servers.
I wonder if we can clarify the wording of this…
“Fifteen of those processors are going to go idle waiting for the one processor to finish. How many processors does that leave for the other queries on the server waiting to be run? Zero. Your whole system grinds to a halt waiting for this one processor to finish its work.”
Are you sure about that? You don’t mean to imply that 15 cores are just going to stop doing anything and wait for the last portion to be completed, do you? Isn’t what will happen is that the processors will park the work they’ve done and start working on other tasks while waiting for the other core to finish.
Read more: http://shaunjstuart.com/#ixzz21koKALbU
Under Creative Commons License: Attribution Non-Commercial No Derivatives
Technically, you are correct. The CPUs that are waiting will switch threads to perform some other task for a bit, but they need to constantly switch back the parallelized query to ask the other processors “Are you done yet?” This can lead to thrashing, or processors constantly switching from one thread to another. They can actually end up spending more time switching between threads than actually performing work, so effectively, the processor isn’t doing anything. I didn’t want to delve too deeply into the concepts of threads and how SQLOS allocates threads to processors, as I try to keep this blog geared towards DBAs with less experience. However, if you want more information about parallelism, Adam Machanic has a great three part series at http://sqlchicken.com/sql-university/ where he goes into this topic in much more detail that I have here. Thanks for the comment!
And as an aside, although I didn’t talk about it here, NUMA can also cause more waits if your query gets parallelized such that it is spread across more CPUs than are in one NUMA node. So if you’ve got both NUMA-switching delays and processors thread-switching delays, you’re wasting a lot of time with housekeeping tasks. You can eliminate both with appropriate settings for MAXDOP.
Thanks for the follow up. I can appreciate trying to keep it high level and conversations geared towards DBA’s without the same experience level. I just wanted to clarify that my understanding wasn’t completely off base… you worried me when you said that the system would grind to a halt while waiting for one thread to finish a parallel query. That’s actually what the CXPacket wait type is measuring, the time that the other threads spend waiting for all the parallel threads to complete.