If you are reading this, chances are you like SQL Server. You may even love SQL Server. You might be addicted to SQL Server. I’m here to enable your addiction. Now I can’t say mainlining SQL is good for you, but I can say baselining your SQL servers is.
There are several good blog posts on the benefits of generating baseline performance data for your SQL servers, so I won’t go into that here. And there is a great post by Feodor Georgiev here on how to collect performance data using the typeperf Windows utility. I’m going to expand a bit on his post because I decided that was the way I was going to create my baselines. Feodor’s post links to a couple other articles about using PerfMon data in the first paragraph, one being Brent Ozar’s post on how to do this and use Excel to massage the data and create reports. With all due respect to Brent, bleh. Excel? Really? You know, since SQL 2005, SQL Server has shipped with this handy little piece called SQL Server Reporting Services. Why not use that? Well, probably because Brent’s method of collecting PerfMon data involved saving the data to a csv file. Feodor’s method, which is in turn based on Jonathan Allen’s method, uses typeperf instead of PerfMon itself, to collect data. Like Brent however, Jonathan’s method still outputted data to a csv file. Feodor’s brilliant contribution was to realize that csv files are très 1990 and that typeperf can write its data directly into SQL Server. Ah! Now we’re getting somewhere!! Couple that was SSRS and we’ve got the makings of a real monitoring tool!
Feodor did all of the hard work and included tons of screenshots in his post, so I recommend reading that to get started. I’m going to write about the issues I encountered when I tried to implement his plan.
- Typeperf is like a stubborn child. Just like a bratty kid who says “I want ice cream” over and over at dinner time, typeperf seems to have one basic error message: “Error: no valid counters.” This message refers to the file of PerfMon counters you feed it. But this error can mean there are no valid counters in the file (perhaps you mis-typed a name), or, as was my case, it can’t even find the file! (Either the file doesn’t exist or you mistyped the name.) Typically, you would expect a program to return a “file not found” error in that case, but not typeperf!
- The ODBC driver matters. Feodor’s post used the SQL Server driver for his ODBC connection. I wanted to use the SQL Server Native Client 10 driver. Why? No real reason, other than the fact that it was newer. I couldn’t get typeperf to work using this, so I had to go back to the standard SQL Server driver.
- Security matters. If you are planning on automating this process by running it as a SQL job (and you probably are), you need to make sure the account your SQL Agent runs under belongs to the Performance Monitor Users security group on all the servers you want to monitor. This will allow it access to the PerfMon counters.
- Security still matters. I had some issues with not being able to see all the PerfMon counters. Specifically, sometimes I was not able to see the SQL counters. I could see them if I was using PerfMon as myself, but when I tried when running PerfMon as my SQL Agent service account, I could not see them. Adding the SQL Agent account to the Local Admins security group seemed to solve this. I will admit right now that this is a potential security issue. I have not fully investigated this and there may be some other solution.
- Sharpen your DOS batch file programming skills. Feodor’s process calls typeperf from a batch file. This requires some mad DOS batch programming skilz. I forgot all my batch file commands 20 years ago, but luckily Google has plenty of examples. I’ll post my batch file and some of the hoops I had to jump through in just a minute.
- Named-instance SQL Servers have different object name values. If you just have the default instance installed on your server, your counter object names will be of the form “SQL Server: <counter name>”. But named instances will have object names of the form “MSSQL$<instance name>: <counter name>”. This makes reporting a bit trickier.
- Typeperf automatically creates tables in the database. And changes them too. When you first run typeperf using the SQL file type so that it writes to a SQL database, it will automagically create three tables. Those are described nicely in Feodor’s post. However, if you try to index the tables, you might be in for a surprise. In the Counter Details table, I wanted to index the MachineName, ObjectName, and CounterName fields. However, each of these columns is created as a varchar(1024) datatype. The maximum index key length is 900 bytes. If you try to make an index with one or more of these fields, you will get the following warning:
OK. I know that none of my machine names, counter object names, or counter names are that long, so I went ahead and changed those fields to smaller values and make my indexes. But – typeperf will change those fields back to 1024 characters! Again, it’s quite the stubborn child. Now, as long as you are sure your indexed fields aren’t going to exceed 900 bytes, you’re ok ignoring this warning. But just be aware of this behavior.
Now, on to the DOS batch file goodness.
If you notice, when writing to a SQL server, typeperf wants you to use “!<textstring> after the DSN name. This is a tag that gets saved to the table so that you can label your collection set. When using the SQL file type, typeperf requires this string. (Again with the stubbornness.)
Now let’s think a bit about what were are trying to do here. I want to create an automated process that collects perfmon data periodically. I’m going to be calling this from a batch file and I want to collect 4 data samples, 15 seconds apart every 30 minutes. I’m going to report on how this data changes over time, so I’m going to need some way to group each collection set. The typeperf tables do include a date and time, but it changes with each sample collected, making it hard to use that to group on. Ideally, I’d like to use this textstring for that purpose – use it to make my own datetime stamp which does not change for each of the 4 data collection samples, but does change every execution each 30 minutes. Which means I need to have some way to change it each time the batchfile executes. That means using some variables and programming. And, as an added twist, DOS batch files use the exclamation point as a keyword for delayed environment variable expansion, so we need to escape it in our code.
In order to create our own datetime stamp, we need to get the current date and time and parse that into variables. In batch files, that’s done using the FOR /f “tokens=…” command. You remember how to do that, right? Me neither. Google saves the day. Below is my batch file. I’ll explain it, line by line.
@echo off echo ------------------- Starting ------------------- echo %date% %time% rem @echo off setlocal enableextensions setlocal enabledelayedexpansion for /f "tokens=1-5 delims=/ " %%s in ("%date%") do (set CurrDate=%%t%%u%%v) for /f "tokens=1-5 delims=:. " %%d in ("%time%") do (set CurrTime=%%d%%e) if %CurrTime% LSS 1000 SET CurrTime=0%CurrTime% @echo on TYPEPERF -f SQL -cf "D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\JOBS\CountersList.txt" -si 15 -sc 4 -o SQL:SQLServerDS^^!%CurrDate%_%CurrTime% @echo off echo echo --- End --- echo
I capture the output of this job to a batch file and lines 1 through 4 are used just to make the logfile easier to read. Lines 5 and 6 are DOS commands needed to work the variable parsing magic.
Line 7 takes the output of the DATE command and splits it into 4 pieces using “/” and ” ” (space) as delimiters. (This command, and the next one, are dependent on your local system settings for the format.) On my system, the DATE command returns the date in the format “Fri 10/07/2011”. Getting back to the command, the four pieces are parsed into the variables %%s through %%v. I grab %%t (the month), %%u (the day), and %%v (the year).
Line 8 does the same thing as line 7 except for the TIME command. Again, this will vary based on your local system settings. In my case, the time is formatted as HH:MM:SS.mmm. A couple of things to note here. First, if the hour is less than 10, the command returns a space plus a one digit hour. If you only use the colon as a delimiter, your hour variable will include a leading space. When all the variables are built into the final typeperf command line, this will result in the ! tag being something like !10072011 850. Anything after the space will be ignored by typeperf, so your tag will end up being just 10072011 and your reporting will be off because all data sets collected while the hour is only one digit will be grouped together. To overcome this, we need to also specify a space as a delimiter. And finally, I want my hour number to always be two digits, so I need to add a leading zero where needed. This is what line 9 does. (LSS is “less than” in batch file programming.)
After all that setup is done, line 12 is where the actual typeperf command is built and executed. Notice we have to escape the exclamation point by using two carets (^).
The typeperf flags are well covered in Feodor’s post, but I’ll run through them quickly here. The -f flag specifies the file type to use. A value of “SQL” indicates a SQL ODBC datasource. -cf indicates the text file of counters to collect. The format of this file is also well covered at Feodor’s blog. Be aware of the difference in counter names between SQL default and named instances that I mentioned earlier. The -si flag tells typeperf the time between collection cycles (in seconds) and the -sc flag tells it how many samples to collect. The -o flag is the output file. In the case of the SQL filetype, this is the name of the ODBC datasource to use to connect to the SQL server. In the above example, my datasource is named SQLServerDS.
Lines 13 through 16 are for my log file to make things easier to read. One thing to note. If you actually run typeperf manually in a command window, you’ll see it displays a spinning text cursor when running. When you run it as a batch file and capture the output to a text file, this show up as long sequences of “[|][/][-][\]”. So to avoid your logfile from growing like crazy, be sure to have each job execution overwrite the existing file instead of appending to it.
So that’s the nitty gritty of collecting the data. I simply scheduled a SQL Server job that executes this batch file every 30 minutes. I’m collecting 17 counters from about 40 different servers and it takes about 5 minutes to run.
Next week, I’ll go over the SQL statements I use to pull data from the tables (again, based on Feodor’s code). I’ll also show you the nifty reports I made with SSRS that are guaranteed to impress your manager!