Shaun J Stuart

Just another SQL Server weblog

Browsing Posts in SQLServerPedia Syndication

Last week, I wrote about what MaxDOP is and why you may not want to leave it at the default value of zero on your multi-processor systems. This week I'm going to talk about how to change it and give some results from my server where I made a change.

MaxDOP is an advanced configuration option, so if your server is not already configured to do so, you'll need to set it to display advanced configuration options:

EXEC dbo.sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO

Now if you just issue the sp_configure command without any parameters, you can see what your current MaxDOP setting is:

You can see I've already changed my MaxDOP value from the default of zero to a value of 8. (The maximum value of 64 is the maximum value SQL Server will accept, not the maximum possible value for this system.)

If you are more of a GUI person, you can see this info in SSMS by right clicking the server in Object Explorer and choosing Properties, then the Advanced tab:

Changing the setting is realtively easy and does not require a server restart to take effect. If you are using the GUI, you can enter a new value directly in the screen shown above. For T-SQL, use the following:

EXEC dbo.sp_configure 'max degree of parallelism', 8;
GO
RECONFIGURE;
GO

Substitute whatever value you want for 8 in the above command, of course.

Now that you know how to change the value, how do you know if you should change the value?

Introducing the CXPACKET wait type

SQL Server can spend time waiting on all sorts of things - data to be read from disk, data to be written to disk, waiting for locks to be acquired or released, or, of particular interest to us, waiting for the results of parallelized queries to get recombined. This later wait is tracked via a wait type counter in SQL Server called CXPACKET. This counter is available in SQL Server 2000 through the DBCC sqlperf(waitstats) command, but in SQL 2005 and later, you can also see it with the sys.dm_os_wait_stats dynamic management view.

The dynamic management view will return a value, in milliseconds, of how long SQL has spent waiting to combine the results for parallelized queries since the last time the server was restarted or the last time the wait stats were cleared. Glenn Berry has, as part of his great diagnostic queries, a query that will list the top wait states on your servers. His query is:

-- Isolate top waits for server instance since last restart or statistics clear
WITH    Waits
          AS (SELECT    wait_type
                       ,wait_time_ms / 1000. AS wait_time_s
                       ,100. * wait_time_ms / SUM(wait_time_ms) OVER () AS pct
                                                             ,ROW_NUMBER() OVER (ORDER BY wait_time_ms DESC) AS rn
              FROM                                            sys.dm_os_wait_stats
              WHERE                                           wait_type NOT IN (
                                                              'CLR_SEMAPHORE',
                                                              'LAZYWRITER_SLEEP',
                                                              'RESOURCE_QUEUE',
                                                              'SLEEP_TASK',
                                                              'SLEEP_SYSTEMTASK',
                                                              'SQLTRACE_BUFFER_FLUSH',
                                                              'WAITFOR',
                                                              'LOGMGR_QUEUE',
                                                              'CHECKPOINT_QUEUE',
                                                              'REQUEST_FOR_DEADLOCK_SEARCH',
                                                              'XE_TIMER_EVENT',
                                                              'BROKER_TO_FLUSH',
                                                              'BROKER_TASK_STOP',
                                                              'CLR_MANUAL_EVENT',
                                                              'CLR_AUTO_EVENT',
                                                              'DISPATCHER_QUEUE_SEMAPHORE',
                                                              'FT_IFTS_SCHEDULER_IDLE_WAIT',
                                                              'XE_DISPATCHER_WAIT',
                                                              'XE_DISPATCHER_JOIN',
                                                              'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
                                                              'ONDEMAND_TASK_QUEUE',
                                                              'BROKER_EVENTHANDLER',
                                                              'SLEEP_BPOOL_FLUSH')
             )
     SELECT W1.wait_type
           ,CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s
           ,CAST(W1.pct AS DECIMAL(12, 2)) AS pct
           ,CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
     FROM   Waits AS W1
            INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
     GROUP BY W1.rn
           ,W1.wait_type
           ,W1.wait_time_s
           ,W1.pct
     HAVING SUM(W2.pct) - W1.pct < 99
OPTION  (RECOMPILE); -- percentage threshold

There are a couple of nice things this query does. First, it converts the milliseconds value the DMV returns to seconds. Second, it orders the waits from largest to smallest and also provides a running total percentage.

So that's how you see what your CXPACKET wait is. But how do you know if you've got a problem. Microsoft suggests that if your CXPACKET wait consists of more than 5% of your total wait time, then you may have a parallelism bottleneck. I will again point out the It Depends SQL caveat applies here, as it does with all general SQL configuration advice.

Tracking Your Wait Times

If you decide you need to change your MaxDOP setting, you're going to want a baseline of your pre-change system operation to compare against to determine if you've actually improved performance. One way to do that is to manually reset the wait stats, let the system run for a set period of time, make the MaxDOP setting change, reset the wait stats again, then let the system run for the same length of time again. Ideally, your two test periods should include times of similar usage.

At this point, I feel the need to point out that you should always test system configuration changes on a test system before making them on production system.

You can reset your wait stats using the following command:

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR)

For my test, I created two jobs. The first job cleared the wait stats and ran on at 8 AM on Monday morning. The second job executed Glenn Berry's query above on the following Friday at 9 PM and wrote the results to a text file. I set up the schedules for the jobs to run twice. (I could have written the data to a table, but I was in a hurry setting this up, so I just wrote to a text file. If you do this, make sure you choose the Append option so you don't overwrite existing data with your second execution.) After the first week, I had my baseline data. When I came in on the second Monday, I made sure I changed the MaxDOP setting prior to 8 AM. (No restart is required when you change this setting.) When the following Friday rolled around, I had my second set of performance data gathered after my changes.

My Results

I should point out, the server I am testing is a 16 core system in a production environment. It is NUMA-enabled and has 8 processor cores per NUMA node. Based on Microsoft's recommendation, I determined I should set the MaxDOP value to 8. My baseline numbers:

wait_type wait_time_s pct running_pct
OLEDB 370135.54 31.68 31.68
CXPACKET 295106.42 25.26 56.94
LCK_M_IS 187728.33 16.07 73.01
WRITELOG 98589.81 8.44 81.45
BACKUPIO 43881.44 3.76 85.21
LATCH_EX 38794.36 3.32 88.53
MSQL_XP 28966.19 2.48 91.01
SOS_SCHEDULER_YIELD 24027.27 2.06 93.07
PAGEIOLATCH_SH 17086.65 1.46 94.53
ASYNC_NETWORK_IO 16296.86 1.39 95.92
BROKER_RECEIVE_WAITFOR 16032.44 1.37 97.3
BACKUPBUFFER 7999.12 0.68 97.98
IO_COMPLETION 5165.44 0.44 98.42
SLEEP_BPOOL_FLUSH 4474.87 0.38 98.81
PAGEIOLATCH_EX 2892.76 0.25 99.05

After I made the change, these were my results:

wait_type wait_time_s pct running_pct
OLEDB 383416.75 35.85 35.85
LCK_M_IS 209511.89 19.59 55.44
CXPACKET 134124.75 12.54 67.98
WRITELOG 102343.21 9.57 77.55
ASYNC_NETWORK_IO 71632.59 6.7 84.24
BACKUPIO 48025.45 4.49 88.73
MSQL_XP 29184.14 2.73 91.46
SOS_SCHEDULER_YIELD 24138.6 2.26 93.72
BROKER_RECEIVE_WAITFOR 17342.3 1.62 95.34
PAGEIOLATCH_SH 14518.61 1.36 96.7
BACKUPBUFFER 7990.42 0.75 97.44
LATCH_EX 7474.26 0.7 98.14
SLEEP_BPOOL_FLUSH 4436.36 0.41 98.56
IO_COMPLETION 3937.39 0.37 98.93
PAGEIOLATCH_EX 2875.57 0.27 99.2

So you can see, the time SQL Server spent waiting for parallelism to complete went down by 160,981,67 seconds - 44.7 hours! I just gave this server one man-week of additional computing time! It spent an additional 6 hours of that time waiting to acquire intent shared locks and an additional 3.7 hours waiting for OLE DB connections, but I'm completely OK with this because those wait times are indicative of a working system. An increase in OLEDB means the server is spending more time waiting for data to return from either a call to a linked server or for data to load from the bulk insert task or some other process that uses the SQL Server native client OLE DB provider. Which, in turn means, SQL is processing more work than it used to.

Now, I'm not going to say that OLEDB wait stat value is a good one. It now represents 35% of the time SQL Server spends waiting and if I can trim that down, I can squeeze some more performance out of this server. But that will require some deeper digging to find out what the OLE DB calls being made are and what, if anything, I can do about them.

If I take a step back to look at the overall picture and add up the total time spent waiting on the top 15 waits (which are the waits Glenn's query returns), I can see my total time spent waiting dropped from 321 hours to 294 hours!

I'm also not going to claim this is a scientifically accurate test. For one, I had no control over the load the server experienced during this week, so the two workloads may have been significantly different. However, by looking at an entire week, I can more or less average out any one-time anomalies a large one-off query might introduce. I also know the nominal workload of my servers and I can make a reasonable assumption that the workload was more or less the same. Further, as a working DBA, I'm more concerned about my actual performance improvement than  a theoretical improvement. So all in all, for a first step in performance improvement, I'm very happy with this result!

Bonus: When analyzing performance data, I like to find alternative ways to validate my conclusions. My conclusion above was that this server was now spending more time processing requests. So if I use my PerfMon monitoring routine to look at the number of batch requests per second this server is handling, I should see an increase from the first week to the second week.

Graph1

Sure enough, if you look at the average low point over the two weeks, you'll see the lows were under 200 for the first week and over 200 for the second. Likewise, the highs are higher, going from typically under 600 to peaking over 600 multiple times.

Share

In my job, I don't have many servers with more than 8 processor cores, so I typically don't set the server-wide MaxDOP setting and instead let it stay at the default of zero, which tells SQL Server to use however many processors it thinks a query should use.

However, I do have a handful of machines that have 16 cores, so I decided to take a look at those and see if tweaking the MaxDOP setting might improve performance.

What is MaxDOP?

By default, SQL Server will analyze a query and make a decision as to if it thinks the query might be completed faster if it splits the query into smaller pieces and sends each piece to a different CPU to complete. Once all CPUs have completed their work, SQL will combine the results and return the complete result set to the user or calling process. This behavior is called parallelism and in theory, it can provide faster query execution on multi-processor systems. In practice, well...  there are always cases were defaults options aren't the best choice.

As a typical rule, you generally want SQL Server to use lots of cores for queries in data warehousing situations - time is not as critical of a factor here and the queries in these scenarios are usually complex, joining several tables, and looking at large amounts of data. It makes sense for SQL to try to use parallelism here. But if you are operating in an OLTP scenario, speed is the most important factor and you're going to want SQL to complete the queries as fast as possible. Additionally, in OLTP systems, the usage patterns typically will involve queries returning relatively few rows from a handful of tables, as opposed to the complex joins in data warehouses.

But why wouldn't you want SQL to use lots of processors in each case? After all, if one processor is good, two or more must be better, right? Not really and there are several reasons for this.

First, SQL Server bases its decision on whether or not to parallelize a query based on the statistics it has for the tables involved. In a data warehouse, the data probably isn't changing too much, except possibly for new data being added, so the data distribution statistics SQL has are likely pretty accurate. But in an on-line transaction processing system, data is constantly being updated, inserted, and deleted. Statistics in SQL Server are updated asynchronously (by default), so the query optimizer might not have up-to-date statistics and may come up with a non-optimal query plan, which can lead to the next problem.

If the statistics are out of date, the parallel plan SQL Server comes up with might not evenly divide the work among the available processors. This will result in one processor getting much more work to do than the others. Because SQL Server can't return a result set until all processors have returned their results, you can end up with several processors sitting around doing nothing while they wait for the other processor to finish its work. And you don't want to have idle processors on your busy production systems, right? Luckily, SQL Server tracks this waiting with the CXPACKET wait type so we can see when and how often this is occurring.

It would be nice if our processors came with a display like this.

Finally, what happens if you have a 16 processor system and SQL Server decides a query needs all of those processors to complete but your statistics were slightly out of date and you end up with the situation just described - one processor has much more work to do than the others? Fifteen of those processors are going to go idle waiting for the one processor to finish. How many processors does that leave for the other queries on the server waiting to be run? Zero. Your whole system grinds to a halt waiting for this one processor to finish its work.

Enter the MaxDOP setting. MaxDOP stands for Maximum degree of parallelism. This is a global server setting that tells SQL Server the maximum number of processors a query can use. By default, this is zero, which means SQL Server can use as many as it wants, up to as many processors as it has available to it. If you've got a handful of processor cores in your server, you typically don't need to worry about changing the default value. But if you've got a beefy server with many cores, you may want to change this.

Microsoft's recommendation can be found here. In a nutshell, they say if you have 8 or less processor cores, set the value to between zero and the total number of processors. If you have more than 8 cores, set MaxDOP to 8. (There is an additional caveat for NUMA-enabled servers, which I'm not going to talk about but do read the MS article if this applies to you. There are also some gotchas with processor affinity to watch out for.)

As with anything SQL Server-related, this advice always comes with the "It Depends" caveat: these are general guidelines and your situation might require different settings. (And Microsoft also has a different set of best practices for Sharepoint servers. For example, here they specifically say MaxDOP on Sharepoint SQL Servers should be set to 1.) I will also note that these recommendations are for SQL Server 2005 and greater. Parallelism in SQL Server 2000 worked a bit differently. In that version, a query could use 1 processor or all the processors, but nothing in between. So, as always, test thoroughly before making changes to a production environment.

I would also like to point out that you can override the server's MaxDOP setting on an idividual query by using the MAXDOP query hint.

Next week I will discuss monitoring the CXPACKET wait type to see if you are affected by a bad MaxDOP setting, how to actually change the MaxDOP setting, and give the results I obtained by changing the setting on one of my production servers.

Share

I read a post recently at Brent Ozar PLF titled Who's Going To Hire You? The gist of the post was that your co-workers are basically interviewing you constantly, so you should always make an effort to do your best work. I have to say, I completely agree with this and, like Brent, this fact led me from one job to another. In truth, it actually, led me to change careers completely! Here's my story.

I graduated college with a degree in electrical engineering. I was living in Southern California at the time and found work as a junior engineer at a company in the telecommunications industry. I was responsible for writing engineering change orders, sourcing parts, and laying out out and routing the circuit boards we developed. I really enjoyed that last task. It involved a combination of engineering, geometric problem solving, and design. I worked for the company for about three years. I was sharing an apartment with a childhood friend and I slowly began to sour on the whole Southern California dream. This was a serious change for me. I was a California native, born and raised in Orange County, just a few miles from Disneyland (close enough to see the fireworks each night in the summer). But I began to realize it would be years before I could afford a house. Traffic was everywhere, virtually 24 hours a day. The weather was great, but frequently, there would be days choked with smog. The Rodney King riots had recently happened. I started thinking about moving elsewhere.

It just so happened that I had a vacation planned to Connecticut, so I managed to set up a single interview while I was out there. This was also for a electrical engineering position responsible for PCB layout and design. I got an offer and decided to move. (In retrospect, moving to Connecticut, one of the states with the highest gas taxes and one known for being a retreat for wealthy people, probably wasn't the wisest move if I was looking for a cheap place to live.) So this California boy packed up and moved across the country. And.. well..

It sure is different living on the East coast. For one, the sun rises out of the ocean and doesn't set into it. For some reason, that really threw me off. Back then, the whole concept of healthy eating was almost unheard of on the East coast. Pizza? You better order it with extra pepperoni and sausage. Forget veggie. And if you suggested pineapple and ham, you'd better run for your life. It took me a couple of months, but I was finally able to pin down one other thing that bothered me: I felt claustrophobic. I was used to the relatively flat terrain of Southern California with its few scrawny palm trees. I could usually see for miles (smog permitting). In Connecticut, there were hills covered with trees. It was pretty to look at, but you often couldn't see more than a mile into the distance. It wasn't all bad though. I loved being able to take a train into New York City, for instance.

And then there was winter. Snow. WTF? OK, I had experienced snow before and knew before I moved that living in Connecticut would mean I would become intimately familiar with it. I had gone skiing several times in California and liked it. But months of it? All the time? I liked the fact that I could bring ice cream home from the store in the trunk of my car without worrying about it melting, but honestly, after two months, I was ready for the snow to be gone. I remember it being April and watching the snow falling. I thought "Isn't April spring? Why is it still snowing?"

(All in all though, I think I did pretty well in the snow, considering my origins. I didn't have any car accidents and only lost control on my car once, for a very short time (only as long as it took for the car to spin through one complete revolution). Although I did make some stupid California-boy mistakes. The most memorable: I was at a gas station filing my tank and decided I needed to clean my windows. I took the squeegee from the bucket of water (which wasn't frozen for some reason) at the base of the gas pump island and ran it across my front window - where the water instantly turned to ice. This was one of those "I hope no one saw that" moments. And I never once slipped and fell on ice - almost. It was my very last day in Connecticut and I was standing outside waiting for the cab to take me to the airport. I slipped and fell. I can't begin to tell you how pissed off I was that I had made it through basically the whole winter, only to fall on my last few hours in the state.)

It was during this time in Connecticut that I got a call from a guy I used to work with in California. He had been the president and head of the engineering department of the first company I worked for and was now building an engineering department for another company in Arizona. He asked if I wanted to join him as a software engineer, working with databases.

Now keep in mind, up to this point, my entire career had been hardware-based. The only experience I had with software was maybe a single class in college and my own tinkerings on my home computers, starting with my old Apple II+ in high school, and later 286s, 386s, and 486s in college. I told him sure, but reminded him I had no experience with databases or software. He said that was OK. He knew me, knew my work, and felt I could do the job.

So after one year in Connecticut, I moved most of the way back across the country to Arizona and started my involvement with SQL Server as a database developer. This was when SQL Server 7.0 was just released in beta and that's what I learned on. I was lucky to have as a mentor at the company one of the software engineers who helped me learn not only the T-SQL language, but good programming methodology. Since that time, I've worked at three other companies, moving from a database developer to a database administrator. I've discovered I love SQL Server. I love managing close to 100 servers. I'm quite happy with my new career.

And I never would have discovered any of this, if that one person didn't have faith in my abilities, faith I'd earned by doing my best, day in and day out. Even though I was moving into a completely different field, he still felt confident enough in my abilities to take a chance on me.

Yes, your coworkers are interviewing you right now. Every day. You never know who will give you that career-changing call or when it will come. But you can increase its chances of coming by taking pride in your work and doing the best job you can.

Share

Like any good DBA, I've got a checklist I go through when I get a new SQL Server. I've got standard jobs I install, standard configurations for the model database to set, steps to take to add the new server into my automated monitoring jobs, etc. The more experience I get, it seems the longer the list gets. Anything I can do to reduce the number of steps that have to be manually performed is welcomed.

If your company uses Windows Group Policy to manage computers, you have a powerful tool to help configure your SQL servers. If you do use Group Policies, odds are your SQL Servers are all in one group and any new servers that come online should be added to this group by your Windows admins. The benefit here is that settings in the Group Policy will automatically get applied to all the servers in that group - including any new ones. This means if you set your policies up correctly, you have fewer things to manually set.

I have asked my Windows admin to configure four items in our SQL Server Group Policy:

  • Power Settings. By default, Windows installs with a Power Setting of Balanced. Our policy changes this to High Performance.
  • Add our SQL Server service account to the Performance Monitor Users local security group. This allows me to include the server in my Perfmon monitoring routine.
  • Grant our SQL Server service account the Lock Pages In Memory right. This is something of a controversial topic, but, in my shop, it makes sense for us to set this as a default.
  • Grant our SQL Server service account the Perform Volume Maintenance Tasks right. This allows SQL Server to use Instant File Initialization.

By including these settings in Windows Group Policy, I can ensure that they are always set on all my SQL Servers. Set it and forget it!

Share

So there you are, sitting at your desk, idly clicking in SSMS, admiring your little kingdom of SQL Servers. Everything looks good. Jobs are running correctly. No performance problems to be seen. Everything is humming along smoothly. It's tempting to relax a little and perhaps take some time to peruse a few blogs or check out some lolcat pictures.

If you've ever seen a horror movie, you know it's times like these when a guy wearing a hockey mask appears behind your chair and thrusts a three foot long machete through your chair and your chest. This is why DBAs are paranoid. (Well, one of the reasons.)

This DBA never saw it coming

Just when you think everything is running smoothly, your phone will ring and someone will tell you they accidentally dropped a database. Or shutdown a server. Or committed some other catastrophic action you thought it would be impossible for a user to do. Never underestimate users. They are a wily bunch and quite possibly the root of all evil.

As Guardian Of The Data, a DBA's job is to allow users to access the data they need but stop them from destroying that data or making it inaccessible to others. In short, your job is to stop them from hurting themselves and others. Because if one of them does do something horrific like dropping the database that contained all the orders your company filled for the last 3 years, guess who is going to get blamed? That's right - DBA Guy.

So it's up to you to protect the data. You do that with access permissions. Users should only have the permissions they need in the database server to perform their jobs. Nothing more.

Unfortunately, many times users may attempt to do something that they don't have permissions to do. They might not know what permissions they need, so they ask for full admin access, because everyone knows that having full access makes all permission issues go away. Sometimes a DBA will give it to them, perhaps with the intention of revoking the permission as soon as the user's task is completed. But even DBAs get busy and forget and you can end up with lots of logins that have admin rights on your servers. Or your servers use Windows groups for access and the network admins are rather non-discriminatory about who they add to the SQL Admins group. One way or another, there are likely way more people who have full access to your SQL Servers than you think there are.

You need to get a handle on this. Remember how I said it was DBA Guy that will get the blame if something goes wrong? The worse the disaster, the more true that is. So you need what Brent Ozar has called a "People Who Can Get The DBA Fired" report. It is a list of all the accounts with admin rights on your SQL Servers. Ideally, it should be created via an automated process. As I mentioned, network admins can add people to Windows groups at any time. If your SQL Admins group looks good today, there's no guarantee that Bobby Tables from Marketing won't find a way to talk himself into getting access to that group tomorrow.

I created a little procedure that will gather a list of every login that has admin rights on your SQL Servers. It takes a list of servers from a table called ServersToCheckDiskStats, creates a linked server entry to that server, then gathers data about the logins with admin rights and deletes the linked server entry. (Yes, I know it's not really appropriately named, but I already had this list set up from my drive space monitoring routine, so I used it here as well. See that post for table definitions and details of how it works.)

As I have explained before, I like to have single script for a task that will work against any version of SQL Server that we have in use. Handling this means my scripts are bigger, take more time to develop, and may be a bit more complicated. But the payoff is that I have a single script that can run against any server in my environment and won't break if a particular server gets upgraded to a newer SQL version. The script below can be run against SQL 2000, 2005, 2008, and 2008 R2 servers and will use the appropriate system tables for each version. (The server version is specified in the ServersToCheckDiskStats table and, for simplicity, it assumes 2008 R2 is the same as 2008.

The routine requires a SQL login and password to create the linked server connections. This login should be created on each server you wish to monitor. The routine uses two tables - SQLAdminAccess and ADAdminGroupMembers. DDL code for the tables is below. The first table will store the logins that have admin access to a particular server. The second contains members of any groups (Windows or local machine) that have admin access. (This table is populated by a PowerShell script I will discuss later.)

CREATE TABLE [dbo].[SQLAdminAccess]
       (
        [ServerName] [varchar](128) NOT NULL
       ,[LoginName] [nvarchar](128) NOT NULL
       ,[AccessLevel] [varchar](15) NOT NULL
       ,[Type] [char](1) NOT NULL
       ,[TypeDesc] [nvarchar](60) NOT NULL
       ,[IsDisabled] [int] NOT NULL
       ,[State] [char](1) NULL
       ,[Created] [varchar](10) NULL
       ,[Updated] [varchar](10) NULL
       ,[StatDate] [date] NULL
       ,CONSTRAINT [PK_SQLAdminAccess] PRIMARY KEY CLUSTERED
            ([ServerName] ASC, [LoginName] ASC, [AccessLevel] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
                  ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
       )
ON     [PRIMARY]

GO

ALTER TABLE [dbo].[SQLAdminAccess] ADD  CONSTRAINT [DF_SQLAdminAccess_StatDate]  DEFAULT (GETDATE()) FOR [StatDate]
GO

CREATE TABLE [dbo].[ADAdminGroupMembers]
       (
        [ServerName] [nvarchar](128) NULL
       ,[ADGroupName] [nvarchar](128) NOT NULL
       ,[ADAccountName] [nvarchar](128) NOT NULL
       ,[StatDate] [date] NULL
       )
ON     [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_ADAdminGroups] ON [dbo].[ADAdminGroupMembers]
	(
	[ADGroupName] ASC,
	[ADAccountName] ASC
	)
	WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
	SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
	DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON,
	ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

ALTER TABLE [dbo].[ADAdminGroupMembers] ADD  CONSTRAINT [DF_ADAdminGroupMembers_StatDate]  DEFAULT (GETDATE()) FOR [StatDate]
GO

Now that the tables are defined, let's look at the procedure that does all the work:

CREATE PROCEDURE [dbo].[GatherSQLAdminAccessData]
AS
-- =============================================
-- Author: Shaun Stuart, shaunjstuart.com
-- Create date: May 22, 2012
-- Description: Procedure to collect info about SQL account with sysadmin rights
-- =============================================
--

      DECLARE @ServerName VARCHAR(128)
      DECLARE @ServerDisplayName VARCHAR(128)
      DECLARE @ServerNamePK INT
      DECLARE @PersistLink BIT
      DECLARE @SQLCmd VARCHAR(3000)
      DECLARE @LinkedServerLogin VARCHAR(50)
      DECLARE @LinkedServerLoginpwd VARCHAR(50)
      DECLARE @SQLServerVersion CHAR(4)
      DECLARE @LoginName NVARCHAR(128)

      SET @LinkedServerLogin = '<enter your login account here>'
      SET @LinkedServerLoginpwd = '<enter your password here>'

/* Get a list of the servers to gather data from and their SQL version */

      DECLARE ServersCursor CURSOR
      FOR
              SELECT    ServerName
                       ,ServerDisplayName
                       ,PK
                       ,PersistLink
                       ,SQLServerVersion
              FROM      ServersToCheckDiskStats
              WHERE     IncludeInCheck = 1

/* Delete old data first */

      DELETE    FROM dbo.SQLAdminAccess
      DELETE    FROM dbo.ADAdminGroupMembers
      OPEN    ServersCursor
      FETCH NEXT FROM ServersCursor INTO @ServerName, @ServerDisplayName,
            @ServerNamePK, @PersistLink, @SQLServerVersion

      WHILE @@FETCH_STATUS = 0
            BEGIN

/* create linked server to get info. Check PersistLink value to see if linked server should remain or be deleted */

                  IF EXISTS ( SELECT    *
                              FROM      master.sys.servers
                              WHERE     @ServerName = name )
                     AND @PersistLink = 0
                     AND @@SERVERNAME <> @ServerName
                     BEGIN
                           EXEC sp_dropserver
                            @ServerName
                           ,droplogins
                     END

                  IF @PersistLink = 0
                     AND @@SERVERNAME <> @ServerName   /*if link is persisted, linked server is already present, no need to add */
                     BEGIN
                           EXEC sp_addlinkedserver
                            @server = @ServerName
                     END

                  EXEC sp_addlinkedsrvlogin
                    @ServerName
                   ,'false'
                   ,NULL
                   ,@LinkedServerLogin
                   ,@LinkedServerLoginPwd

/* pull all SQL logins that have sysadmin rights using the SQL version-appropriate system table */

                  SELECT	@SQLCmd = CASE @SQLServerVersion
                    WHEN '2008' THEN
                    'INSERT  INTO dbo.SQLAdminAccess
						(ServerName
						,LoginName
						,AccessLevel
						,Type
						,TypeDesc
						,IsDisabled
						,State
						,Created
						,Updated)
					SELECT  ' + '''' + @ServerDisplayName + '''' + '
						,sp.name AS [loginname]
						,sp2.name AS AccessLevel
						,sp.type
						,[type_desc] = CASE sp.type
							WHEN ''U'' THEN ''WINDOWS_LOGIN''
							WHEN ''S'' THEN ''SQL_LOGIN''
							WHEN ''G'' THEN ''WINDOWS_GROUP''
						END
						,sp.is_disabled
						,sper.state
						,CONVERT(VARCHAR(10), sp.create_date, 101) AS [created]
						,CONVERT(VARCHAR(10), sp.modify_date, 101) AS [update]
					FROM    [' + @ServerName + '].master.sys.server_principals AS sp
					JOIN [' + @ServerName + '].master.sys.server_role_members AS srm ON sp.principal_id = srm.member_principal_id
					JOIN [' + @ServerName + '].master.sys.server_principals AS sp2 ON srm.role_principal_id = sp2.principal_id
					JOIN [' + @ServerName + '].master.sys.server_permissions AS sper ON sper.grantee_principal_id = sp.principal_id
					WHERE sper.TYPE = ''cosq''
                         AND sp2.name IN (''sysadmin'',
										  ''serveradmin'',
										  ''securityadmin'')'
					WHEN '2005'	THEN /* same as 2008 */
					'INSERT  INTO dbo.SQLAdminAccess
						(ServerName
						,LoginName
						,AccessLevel
						,Type
						,TypeDesc
						,IsDisabled
						,State
						,Created
						,Updated)
					SELECT  ' + '''' + @ServerDisplayName + '''' + '
						,sp.name AS [loginname]
						,sp2.name AS AccessLevel
						,sp.type
						,[type_desc] = CASE sp.type
							WHEN ''U'' THEN ''WINDOWS_LOGIN''
							WHEN ''S'' THEN ''SQL_LOGIN''
							WHEN ''G'' THEN ''WINDOWS_GROUP''
						END
						,sp.is_disabled
						,sper.state
						,CONVERT(VARCHAR(10), sp.create_date, 101) AS [created]
						,CONVERT(VARCHAR(10), sp.modify_date, 101) AS [update]
					FROM    [' + @ServerName + '].master.sys.server_principals AS sp
					JOIN [' + @ServerName + '].master.sys.server_role_members AS srm ON sp.principal_id = srm.member_principal_id
					JOIN [' + @ServerName + '].master.sys.server_principals AS sp2 ON srm.role_principal_id = sp2.principal_id
                                             AND sp2.name IN (''sysadmin'',
                                                              ''serveradmin'',
                                                              ''securityadmin'')
					JOIN [' + @ServerName + '].master.sys.server_permissions AS sper ON sper.grantee_principal_id = sp.principal_id
					WHERE sper.TYPE = ''cosq''
						     AND sp2.name IN (''sysadmin'',
									''serveradmin'',
									''securityadmin'')'
					WHEN '2000' THEN
                    'INSERT  INTO dbo.SQLAdminAccess
						(ServerName
						,LoginName
						,AccessLevel
						,Type
						,TypeDesc
						,IsDisabled
						,State
						,Created
						,Updated)
					SELECT  ' + '''' + @ServerDisplayName + '''' + '
						,loginname
						,''sysadmin'' AS AccessLevel	/* Cheating here. Only looking for this access level */
						,''S'' AS type					/* not serveradmin and securityadmin for 2000 servers */
						,''SQL_LOGIN'' AS type_desc
						,CASE hasaccess
							WHEN 1 THEN 0
							ELSE 1
						END AS is_disabled
						,''G'' AS State /* SQL logins cant be denied acess in sql 2000 */
						,CONVERT(VARCHAR(10), createdate, 101) AS [created]
						,CONVERT(VARCHAR(10), updatedate, 101) AS [update]
					FROM    [' + @ServerName + '].master.dbo.syslogins
					WHERE   sysadmin = 1
							AND syslogins.isntname = 0
							AND syslogins.isntgroup = 0
							AND syslogins.isntuser = 0
					UNION ALL
					SELECT  ' + '''' + @ServerDisplayName + '''' + '
							,loginname
							,''sysadmin'' AS AccessLevel	/* Cheating here. Only looking for this access level */
							,''U'' AS type					/* not serveradmin and securityadmin for 2000 servers */
							,''WINDOWS_LOGIN'' AS type_desc
							,CASE hasaccess
								WHEN 1 THEN 0
								ELSE 1
							END AS is_disabled
							,CASE denylogin
								WHEN 1 THEN ''D''
								ELSE ''G''
							END AS State
							,CONVERT(VARCHAR(10), createdate, 101) AS [created]
							,CONVERT(VARCHAR(10), updatedate, 101) AS [update]
					FROM [' + @ServerName + '].master.dbo.syslogins
					WHERE   sysadmin = 1
							AND syslogins.isntname = 1
							AND syslogins.isntgroup = 0
							AND syslogins.isntuser = 1
					UNION ALL
					SELECT  ' + '''' + @ServerDisplayName + '''' + '
							,loginname
							,''sysadmin'' AS AccessLevel	/* Cheating here. Only looking for this access level */
							,''G'' AS type					/* not serveradmin and securityadmin for 2000 servers */
							,''WINDOWS_GROUP'' AS type_desc
							,CASE hasaccess
								WHEN 1 THEN 0
								ELSE 1
							END AS is_disabled
							,CASE denylogin
								WHEN 1 THEN ''D''
								ELSE ''G''
							END AS State
							,CONVERT(VARCHAR(10), createdate, 101) AS [created]
							,CONVERT(VARCHAR(10), updatedate, 101) AS [update]
					FROM [' + @ServerName + '].master.dbo.syslogins
					WHERE   sysadmin = 1
						AND syslogins.isntname = 1
						AND	syslogins.isntgroup = 1
						AND syslogins.isntuser = 0'
					ELSE        /* not a SQL version this procedure handles */
                         'INSERT    DatabaseDiskStats
                        (ServersToCheckPK,
                         Server_name,
                         Database_name,
                         [Filename],
                         Drive_letter)
						SELECT  ' + CONVERT(VARCHAR(6), @ServerNamePK) + ',' + ''''
                                             + @ServerDisplayName + '''' + ','
                                             + ''''
                                             + 'undefined SQL version in table ServersToCheckDiskStats'
                                             + '''' + ' AS LoginName ,' + ''''
                                             + 'ERROR' + '''' + ',' + ''''
                                             + '*' + ''''
					END
 PRINT @sqlcmd /* for debugging use */
					EXEC (@SQLCmd)

/* Done. Drop linked server if required */

                  IF @persistLink = 0
                     AND @@SERVERNAME <> @ServerName
                     BEGIN
                           EXEC sp_dropserver
                            @ServerName
                           ,droplogins
                     END

                  FETCH NEXT FROM ServersCursor INTO @ServerName,
                        @ServerDisplayName, @ServerNamePK, @PersistLink,
                        @SQLServerVersion

            END

      CLOSE ServersCursor
      DEALLOCATE ServersCursor

Let me say a few words about this. You'll notice the code for 2008 and 2005 servers checks not only for members of the sysadmin group, but also for members of the serveradmin and securityadmin groups. This is because members of those groups can also wreck havoc on your servers. In regards to the securityadmin group, Microsoft even states "The securityadmin group shold be treated as equivalent to the sysadmin group." The same caveats apply to SQL 2000 servers, but due to the way the system tables are setup in that version, I only check for members of the sysadmin group on those servers. In my situation, we have few SQL 2000 servers and those we do have will be going away soon, so this isn't a big concern for me. If you want to modify this routine to also check the serveradmin and securityadmin group on 2000 servers, you'll need to modify the code to add a bunch more UNION statements and check if the syslogins.serveradmin and syslogins.securityadmin fields are equal to 1.

The code also identifies accounts that have been disabled and / or denied access. I made this distinction because in my environment, I came across some Windows logins that had sysadmin rights, but the accounts had been removed from Active Directory. If this is the case, SQL cannot deactivate the logins (it checks AD and returns an error that the account does not exist), but you can deny them access. Why do I care? Because before I delete an account, I want to be 100% sure it is not being used. To do that, I disable or deny access to the account and let it sit for a period of time before deleting it. If any processes fail, I know the account was in use and I can easily re-enable it before determining the correct access level it needs. Now, I know if a Windows login doesn't exist in Active Directory, it's a fairly safe bet that it's no longer being used to connect to SQL Server. But DBAs are paranoid, remember?

You'll notice the tables have a column that defaults to the current date. This is so my report can display the date this information was gathered.

The final piece of the puzzle is to determine the members of any Windows or local server groups that have admin rights. In order to do that, you could use the xp_loginfo stored procedure. I opted not to go this route. Actually, I tried it initially, but was having difficulties running this against linked servers. Instead, I decided to use PowerShell to obtain this information, which gave me a chance to learn a bit more about PowerShell. I wrote two scripts - one to get members of Active Directory groups and one to get members of local machine groups. (In particular, SQL 2005 creates local machine groups that may have admin rights.) Again, I've modified these scripts from my previous Drive Space Monitoring routine, so the variable names might be a bit misleading.

Save this script as SQLAdminGroupMembers.ps1.

# ---------------------------------------------------------------------------
### Author:
### Shaun Stuart
### www.shaunjstuart.com
# ---------------------------------------------------------------------------

$DiskStatsServer = '<enter your server name here>'
$DiskStatsDB = '<enter the database name here>'
$SelectQuery = @"
                SELECT  DISTINCT RIGHT(LoginName, (LEN(loginname) - CHARINDEX('\', loginname))) AS LoginName
                         ,LEFT(loginname, (CHARINDEX('\', loginname) - 1)) AS DomainName
                FROM    [DSFCUReportingDB].[dbo].[SQLAdminAccess]
                WHERE   DSFCUReportingDB.dbo.SQLAdminAccess.LoginName LIKE '\%'
                        AND TypeDesc = 'WINDOWS_GROUP'
"@

#######################
## MAIN             ##
#######################

$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=$DiskStatsServer;Database=$DiskStatsDB;Integrated Security=True"

# Can't issue the update command via the same connection as the SELECT because of the DataReader loop, so we need a new connection

$SqlUpdConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlUpdConnection.ConnectionString = "Server=$DiskStatsServer;Database=$DiskStatsDB;Integrated Security=True"

$SqlConnection.open()
$SQLUpdConnection.open()

$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = $SelectQuery
$SqlCmd.Connection = $SqlConnection

$SqlUpdCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlUpdCmd.Connection = $SqlUpdConnection

$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd

$ds = $SqlCmd.ExecuteReader()
while ($ds.Read()) {
	$LoginName = $ds.GetValue(0)
    echo "LoginName $LoginName"
    $a = '' # Clear out the array. Otherwise, groups that don't exist will return members of previous group
    $a = get-adgroupmember $LoginName
    foreach ($objitem in $a) {
        #'The Windows account {0} is in group {1}' -f $objitem.samaccountname, $LoginName

        	$UpdateQuery = @"
                	INSERT ADAdminGroupMembers
                        (ServerName
                         ,ADGroupName
                         ,ADAccountName
                         ,StatDate)
                	SELECT ''
                        ,'$LoginName'
                	    ,'$($objitem.SamAccountName)'
                        , '$(Get-Date)'
"@
            #echo $UpdateQuery
	$SqlUpdCmd.CommandText = $UpdateQuery
	$SqlUpdCmd.ExecuteNonQuery()

        }

}
$ds.Close()
$SqlConnection.Close()
$SqlUpdConnection.Close()

Save this script as LocalSQLAdminGroupMembers.ps1.

# ---------------------------------------------------------------------------
### Author:
### Shaun Stuart
### www.shaunjstuart.com
# ---------------------------------------------------------------------------

$DiskStatsServer = '<enter your server name here>'
$DiskStatsDB = '<enter your database name here>'
$SelectQuery = @"
            SELECT DISTINCT
                    CASE WHEN Servername LIKE '%\%' /* remove instance name */
                         THEN LEFT(LEFT(DSFCUReportingDB.dbo.SQLAdminAccess.ServerName,
                                        REPLACE(CHARINDEX(',', ServerName) - 1, -1, 129)),
                                   CHARINDEX('\', servername)-1)
                         ELSE LEFT(DSFCUReportingDB.dbo.SQLAdminAccess.ServerName,
                                   REPLACE(CHARINDEX(',', ServerName) - 1, -1, 129))
                    END AS ServerName
                   ,RIGHT(LoginName, (LEN(loginname) - CHARINDEX('\', loginname))) AS LoginName
            FROM    [DSFCUReportingDB].[dbo].[SQLAdminAccess]
            WHERE   DSFCUReportingDB.dbo.SQLAdminAccess.TypeDesc = 'windows_group'
                    AND DSFCUReportingDB.dbo.SQLAdminAccess.LoginName NOT LIKE '%'
                    AND DSFCUReportingDB.dbo.SQLAdminAccess.LoginName NOT LIKE 'NT Ser%'

"@

# Function Get-LocalGroupMembers from http://gallery.technet.microsoft.com/scriptcenter/List-local-group-members-762b48c5

function Get-LocalGroupMembers
{
    param(
        [parameter(Mandatory=$true,ValueFromPipeline=$true,ValueFromPipelineByPropertyName=$true)]
        [Alias("Name")]
        [string]$ComputerName,
        [string]$GroupName = "Administrators"
    )

    begin {}

    process
    {
        # If the account name of the computer object was passed in, it will
        # end with a $. Get rid of it so it doesn't screw up the WMI query.
        $ComputerName = $ComputerName.Replace("`$", '')

        # Initialize an array to hold the results of our query.
        $arr = @()

        $wmi = Get-WmiObject `
                        -ComputerName $ComputerName  `
                        -Query "select * from win32_groupuser where GroupComponent=`"Win32_Group.Domain='$ComputerName'`,Name='$GroupName'`""
        # Parse out the username from each result and append it to the array.
        if ($wmi -ne $null)
        {
            foreach ($item in $wmi)
            {
                $arr += ($item.PartComponent.Substring($item.PartComponent.IndexOf(',') + 1).Replace('Name=', '').Replace("`"", ''))
            }
        }

        $hash = @{ComputerName=$ComputerName;Members=$arr}
        return $hash
    }

    end{}
}

#######################
## MAIN             ##
#######################

$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=$DiskStatsServer;Database=$DiskStatsDB;Integrated Security=True"

# Can't issue the update command via the same connection as the SELECT because of the DataReader loop, so we need a new connection
# (This may not be true for SQL 2005+. I think SQL 2005+ uses MARS, which allows this.)

$SqlUpdConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlUpdConnection.ConnectionString = "Server=$DiskStatsServer;Database=$DiskStatsDB;Integrated Security=True"

$SqlConnection.open()
$SQLUpdConnection.open()

$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = $SelectQuery
$SqlCmd.Connection = $SqlConnection

$SqlUpdCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlUpdCmd.Connection = $SqlUpdConnection

$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd

$ds = $SqlCmd.ExecuteReader()
while ($ds.Read()) {
	$ServerName = $ds.GetValue(0)
    $LoginName = $ds.GetValue(1)
    echo "ServerName $ServerName ---- LoginName $LoginName"
    $a = '' # Clear out the array. Otherwise, groups that don't exist will return members of previous group
    $a = (Get-LocalGroupMembers -ComputerName $ServerName -GroupName $LoginName).Members
    echo $a

    foreach ($objitem in $a) {

        	$UpdateQuery = @"
                	INSERT ADAdminGroupMembers
                        (ServerName
                         ,ADGroupName
                         ,ADAccountName
                         ,StatDate)
                	SELECT '$ServerName'
                        ,'$LoginName'
                        ,'$objitem'
                        ,'$(Get-Date)'
"@
#            echo $UpdateQuery
	$SqlUpdCmd.CommandText = $UpdateQuery
	$SqlUpdCmd.ExecuteNonQuery()

        }

}
$ds.Close()
$SqlConnection.Close()
$SqlUpdConnection.Close()

To pull everything together, create a SQL job that runs as often as you want to collect this information. Step 1 in the job is to execute the stored procedure. Step 2 is to execute the first PowerShell script. Step 3 is to execute the second PowerShell script. I will note I have my job run the PowerShell script via CmdExec (powershell.exe &'<full path to script>\SQLAdminGroupMembers.ps1 - the ampersand is needed). I was not able to get the scripts to run using the PowerShell Job Step Type.

Permissions Needed

The SQL login you use to create your linked servers will need to exist on each server you want to monitor. The login needs to be in the public SQL group and needs the VIEW ANY DEFINITION right. For SQL 2000 servers, it needs access to the master database as a member of the public database role.

In order for the PowerShell scripts to gather group members, the account executing the script (which will be the account SQL Server Agent is using if you run this as a scheduled job), needs local admin rights on each machine it connects to (in order to get local group members). (See below for a possible way around this if you don't want this security risk.)

Additionally, the Active Directory PowerShell module needs to be loaded on the server that executes the scripts. (It is installed by default on Windows Server 2008 and higher).

What if you have machines in multiple domains? As written, these PowerShell scripts assume a single domain. However, you can modify them to pass credentials to another domain and execute as a different user in that domain. In order to do this, the other domain needs to have at least one domain controller running Windows Server 2008 R2. Then, you can use the New-PSDrive cmdlet. Details can be found here and here. You can also use this method to supply alternate credentials for your current domain if you don't want to give local admin rights on your servers to the SQL Server Agent service account. Unfortunately, I do not have an environment where I can test this, so this exercise is left for the reader.

There you have it. An easy, automate-able method to get a list of people who can get you fired! It's a simple matter to create a report in SSRS that pulls data from your two tables. (A final code piece is shown below. This is the data set definition for my SSRS report.) When I first ran this in my environment, I ended up with a report that was 22 pages long! I'm working on paring that down now. Check out your systems. I think you might also be surprised.

SELECT  [ServerName]
       ,[LoginName]
       ,[AccessLevel]
       ,[TypeDesc]
       ,[Created]
       ,[Updated]
       ,[StatDate]
       ,CASE WHEN dbo.SQLAdminAccess.IsDisabled = 1
                  AND STATE = 'D' THEN 'Disabled and denied access'
             WHEN dbo.SQLAdminAccess.IsDisabled = 0
                  AND STATE = 'D' THEN 'Denied access'
             WHEN dbo.SQLAdminAccess.IsDisabled = 1
                  AND STATE <> 'D' THEN 'Disabled'
             ELSE ''
        END AS [Status]
FROM    [SQLAdminAccess]
ORDER BY ServerName
       ,LoginName
Share