This post contains details about the behavior of one particular program we use here at my work, but the general idea should be applicable anywhere.
The IT department at my work, of which I am a part, uses a tool called Orion by Solarwinds Software for network performance monitoring. The software uses SQL Server as its database back end. Therefore, I am in charge of maintaining that database. As part of my regular duties, I monitor the growth of our database files and noticed that the database for this program was growing. At its current rate, it would exceed our available space in about three months. So I contacted some of my coworkers and we found some settings in the program where we could limit how much historical data is retained. We settled on a 90 day retention length. Problem solved.
Except I noticed that the database continued to grow, even after deleting old data. Hmm. I looked at the properties of the database and saw that roughly 33% of the database space was unused. Yet the thing grew and grew. I had had some issues with this database before. Previous experience had led me to discover that this particular piece of software creates and deletes many tables on a regular basis. I found this out because one of my index maintenance jobs was failing when it got to this database. It turns out, I was creating a list of tables whose indexes needed to be rebuilt and, by the time routine got around to rebuilding the indexes on a subset of these tables, the tables had been deleted, thus causing my job to fail. I changed my routine to handle this, but the issue had flagged this database in my mind as one that required special handling.
I prefer to use database maintenance tools provided by the software vendor before I try to create some maintenance routines of my own to use on their database. My thought is that they presumably know their database better than I do and they would therefore have a better understanding of what should and shouldn’t be done. As I found out, this is not always the case.
The Developer Doesn’t Always Know Best
I read the Orion Administrator Guide and there, on page 271, they give steps to compact and reindex the database using their tools. Cool! So I followed their instructions and used their tool, which provided a nice display that showed what table the program was currently working on. As I watched, it cycled through the tables, rebuilding the indexes. I launched Activity Monitor in SSMS and saw the utility was just issuing DBCC REINDEX commands against the tables in the database. After 30 to 45 minutes, the display changed to “Compacting Database.” Checking Activity Monitor, I saw the utility had just issued a DBCC SHRINKDATABASE command.
Uh-oh. Red flags went up. Why? Because compacting the database will create almost 100% index fragmentation. So that 30 to 45 minutes of work the program spent rebuilding the indexes was just thrown out the window when it started the database shrink. From an operational standpoint, I wasn’t too concerned because I knew my regular index maintenance jobs would catch this and whip those indexes back into shape, but it did make me realize that, although software vendors might know their databases, they don’t necessarily know how SQL Server works. Given that I’ve seen firsthand how little software developers know about SQL Server, I’m not sure why I am surprised. I guess I figured a piece of software that was released and sold must have undergone some sort of quality control testing. (OK, stop laughing.)
I let the shrink continue. But wait! I hear you say. You’re shrinking a database? Why? That’s verboten! Yes, it is. Shrinking is Evil with a capital E and should never be done – except under certain specific circumstances. And a good DBA knows what those circumstances are and when to break the rules. As Gail Shaw, Microsoft MVP, states: “There are cases where shrinking data files does make sense. When a process created lots of tables for processing then dropped them again…” I know from my previous experience that this is exactly what is happening in this database. In fact, Orion creates a new table every 15 minutes, then deletes it again approximately an hour and fifteen minutes later. On my system, at any given time there are 36 of these transitory tables in existence, each about 2 MB in size. (If you are following along with your Orion installation at home, the tables I’m talking about are named NetFlowDetail_XX_XXXXXXX, where the Xs are numbers.) When the shrink conpleted, I checked the database and the file size had dropped down to about 40 GB and the 21 GB of free space had been returned to the operating system. There was about 450 MB of free space left in the database.
My assumption above was that the creation and deletion of all those tables was what was causing the database to grow. But I did some more digging and that turned out to not be the case. How can I be sure? Because this database was set up with four filegroups. My disk space monitoring routine tracks the sizes of each filegroup in a database. Although the standard report I use consolidates those totals into one number for a database, I do have the totals for individual filegroups in my base table. Previously, I created another report that graphs the size of each individual file used by a database. Here’s what the graph of this one looks like:
The yellow line for drive T: is the transaction log size. The database is in simple recovery mode and I’ve been spending time trying to right-size it so I don’t end up with too many VLFs, so that explains the many little drops and growths. The drop you see in the other lines around December 21 is when the shrink command was issued. You can see the shrink really only affected two files, represented by the light blue line and the grey line. The light blue line is the primary filegroup and the grey line is filegroup FG3. The other filegroups continued their growth.
What I discovered was that the tables that were being created and deleted so often are part of filegroup FG2. That filegroup did not decrease in size during the shrink, so they could not be responsable for all the free space that was in the database. I decided to focus my efforts on the two filegroups that showed a size decrease after the shrink. Of those two, I ended up only looking at FG3. The primary filegroup had lots of tables in it while FG3 only had about 3, thus reducing the items I had to investigate.
Unfortunately, I haven’t come up with any explanations so far. FG3 continues to grow, but this is expected as the tables in that filegroup are three tables holding historical data that get written to daily. I expect growth. I’ve switched tactics now to logging the total database size and free space size on an hourly basis and charting that. I’m trying to see if the database free space grows over time. Here is a chart so far:
The yellow line is the total database size divided by 100. The blue line is the free space in the database. The red line is the percentage of free space in the database and probably isn’t all that useful. I should note that autoshrink is disabled and no automated shrink jobs are running. Autogrowth is enabled. There is a maintenance job that runs about 1 AM that removes data that is older than 90 days. This is done via the program’s management tool though, and I’m not 100% sure exactly what commands it is issuing. (For instance, why did the total database size drop around midnight on 12/30? If the tool is issuing a SHRINK DATABASE, I would expect to see a drop each night. )
At this point, I don’t have enough data to see if there is any sort of trend. I’m worried about the database growing over time again but ending up with 1/3 of the database being free space. I don’t want to schedule a SHRINKDATABASE command to occur on a regular basis if I don’t have to. I don’t really like writing a blog post about this without having any answers, but I’m throwing this out there to see if anyone has any ideas what might be causing this. I have checked (once only) to see if the ghost cleanup task might be falling behind in deleting records, but that doesn’t seem to be the case. Paul Randall’s script doesn’t show any ghost processes running.
I will state that this database was brought over from a SQL 2000 server to a 2008 server and I know SQL 2000 had issues with properly reporting space used. When I migrated the database, I did run DBCC UPDATEUSAGE, but perhaps that just updates the usage data internally and doesn’t actually change the file size reported to the operating system until a SHRINKDATABASE is done. Unless I can come up with another explanation, this is the assumption I will work with going forward. Still, it seems strange that it would be off by such a large amount. Or perhaps this was caused by some other quirk of how SQL 2000 operated. Hopefully, my hourly tracking will show if there is something going on that causes the database and the free space in it to grow over time.
If you have other ideas, please leave a comment.