A couple years ago, I was interviewing for a new job and one of the interviewers I met with asked me how I would go about tuning a query to improve performance. One of the options I mentioned was to run the query through the Database Engine Tuning Advisor, although, I added, its results should be taken with a grain of salt and not blindly implemented. This seemed to take the interviewer by surprise and he asked me to elaborate. I did so, and I will do so again here in more detail.
The Database Engine Tuning Advisor is a great tool that Microsoft started including with SQL Server 2005. It can
take a query and look at the tables used in the query and recommend steps to improve performance. It can recommend adding clustered and non-clustered indexes, table partitions, statistics, and indexed views, to name a few. It will also report the performance improvement it thinks its changes will achieve. It’s a great tool to have in your DBA toolbox.
However, as with most cases of automated analysis, a human should carefully review the recommendations and not blindly implement them. Why? There are a couple of reasons.
First, the DETA works in a vacuum. All it knows about your server is the query you gave it and the current database structure – indexes, statistics, etc. It has no idea of the server workload. Perhaps the query you are optimizing is one used once a month for reporting on an OLTP system that is handling thousands of updates and inserts a minute. Any added indexes that DETA recommends will slow down access to the table because now SQL has to update additional indexes for each DML operation. This could adversely affect performance. (Note: you can give DETA a SQL Profiler trace file as an input and it will consider that entire workload as it evaluates options, but in this case I am talking about optimizing a single query. You cannot do this and provide a workload file at the same time, unless the query is part of that workload.)
Second, I’ve seen DETA make some recommendations that are redundant. Well, technically, they are not redundant as the recommendations are not identical. But they are close enough so that only one of the two would be needed. For example, yesterday, I was using DETA and it recommended two indexes on a table. One index was just on Column A. The second recommended index was on Column A with Column B as an included column in the index. Technically, those aren’t redundant, but in practice, the second index will provide all the functionality of the first.
Lastly, DETA can only provide recommendations that fit the parameters it has been given. Here is a good example. I’ve got a table in a vendor database that contains approximately 450 million rows. There is no clustered index. (I know, I know. Ridiculous.) The table takes 45 GB of space. There are three non-clustered indexes on that table. As a test, I had DETA tune a simple query that did a SELECT * FROM Table WHERE ColumnF = <some value>. ColumnF was not included in any of the existing indexes. Because this table is so big, I do not want DETA to recommend creating a clustered index. Creating that would require reordering all the data on the disk and would likely take a very long time. So I checked the option to recommend nonclustered indexes only.
What was the result? DETA recommended a nonclustered index on ColumnF with all the other fields in the table listed in the INCLUDE clause of the index. (Basically, a clustered index in all aspects except name.) This would result in a 99% performance improvement! Sounds great, right? But DETA also reports this index will require another 45 GB of disk space.In effect, DETA just told me to double the size of my table. Plus there is the before mentioned impact of DML commands having to update an additional index.
Clearly, it would be unwise to blindly accept this recommendation. A better option would be to leave off the included columns. SQL will need to do a lookup to get the other fields, but when compared to the additional storage requirements and index upkeep during DML operations, this is a small price to pay.
These examples show why it is important for an experienced DBA to carefully review the recommendations of the Database Engine Tuning Advisor and not just blindly implement them. As with any advice, the DBA is free to do with it as he or she pleases, including ignore it or just take selected bits.
That’s my advice. Do with it as you please. :-)
In your example of the vendor database, you say you did a query of “SELECT * FROM Table WHERE ColumnF = ” and the DTA recommended an index based on columnF and covering the rest of the columns. That’s actually pretty sound for the query you used. What you received as a recommendation is called a Covering Index, as it covers all of the columns in the query.
I’ve never heard of an index that covers all the columns in a table referred to as a Clustered Index. A Clustered index is an index that organizes a table based on a column or composite of columns. A table with no clustered index is referred to as a Heap.
Great article though, the DTA definitely is a little over-eager to recommend included columns to try to cover every little bit of query.
What I meant by that was a clustered index IS the table. It’s how the rows in the table are stored on disk. So, in effect, the clustered index can be viewed as an index on the clustering columns with all the other columns in the table as included columns. If you make a non-clustered index on some column(s) that INCLUDEs all the remaining columns, in one sense, you’ve made a clustered index because it is also an index that contains all the data. (The difference is that the rows might not be ordered on disk the way the non-clustered index key is.)
Sorry, but I don’t see the commonality. If you make a non-clustered index on every column on a table, you’ve effectively doubled your table. A clustered index is indeed “the table”, but it still only indexes the column or columns you have created the clustered index on. If you have a table with columns A – F, and you created a clustered index on column A, and then query for a specific value in column B, that clustered index is not going to help as much as having an index on column B will help.
Forget about how indexes are used by queries. My point was that a clustered index includes all the data in the table. A non-clustered index that includes all the columns also contains all the data in the table. So yes, you’ve doubled the table. But since the table IS the clustered index, you could also say you’ve made another clustered index (i.e. another table). It appears this was a poor analogy.