Normalization Gone Amuck

Database normalization is typically a good thing. It helps reduce the amount of errors in a database by having important information only stored in one place.It maintains referential integrity and helps prevent what are known as insertion and deletion anomalies. It allows the database design to accommodate future expansions of data storage requirements and makes it easy to retrieve data through a language such as SQL. There are various levels, or forms of normalization. The most common in average database usage is the third normal form. Although DBAs may not be familiar with the terminology of normalization, I think most can recognize it when they see it. They are also probably likely to be able to design a normalized database (well, most senior DBAs anyway) and to write queries against one. But once in a while, you’ll run up against a programmer that either doesn’t understand the concept of normalization or who knows a little bit about it and takes to to extremes. I ran across one of these today.

Well, I didn’t actually meet the guy (or gal) in person. But I did come across their database design. Check out this portion of the design:

This is NOT good design

Clearly, the designer had some inkling of normalization. Putting the states and countries into their own table is proper use of normalization. If the tblAddress table contains the addresses of 2 million people living in California, there’s no need to store “California” and “United States Of America” 2 million times. Put that data in another table once and store a pointer to it (in the form of a foreign key) in the address table. This also prevents addresses from having state values of “CA”, “Calif”, “Californa”, and the like.

But this designer took normalization one step too far. He (or she) put the street name into its own table. Now, I will give the designer props for separating out the number portion of the address (vcStreetNO) from the street name (vcStreet), but really, this is not an appropriate use of normalization. And yes, there are probably lots of “Main Streets” in California, so technically, you may save a few bytes here and there, if you looked for duplicates. But how will this table be maintained? If someone needs to enter a new address, you can reasonably present them with a list of all the states and countries from those tables to choose from. But you are you going to list all the streets in California for them to select from? No way. So you are most likely going to let the user type in some text for the street address then insert that data into the tblStreet table. And I’d bet willing to bet you’re not going to check to see if that street already exists in the table. Even if you did, when you consider the possibilities of street names including “North”, “East”, “N.” etc. (not to mention user typos) you’ll really reduce the number of potential matches you might find. This defeats one of the purposes of normalizing data – to minimize storing duplicate data.

This is a just a bad design. It serves as an example of why DBAs should be at least somewhat involved in the database design process.

Disabling Mirroring During Maintenance

I wrote a while ago about setting up alerts for database mirroring events – when mirroring is suspended and resumed. It’s been a month or so since I’ve implemented those alerts and I’ve discovered I need to make some minor adjustments to my maintenance plans to accommodate them.

My standard maintenance plans are set to reorganize fragmented indexes nightly and to rebuild fragmented indexes once a week. These are both fully logged operations, so when these steps are performed on the mirrored databases, all that activity is also sent to the mirror server for replay on those copies of the databases. My maintenance plans are also all set to run at the same time on all my servers. This means that when the mirror server is performing the index work on its databases, it also suddenly starts getting all the index maintenance commands from the principal server for the mirrored databases and has to execute those transactions as well. My mirror server was getting hammered. The SAN could not keep up with the load and I was seeing errors left and right. The Windows application log was logging errors about disk access taking longer than 15 seconds and my mirroring sessions eventually got so far behind, they suspended themselves. Clearly, something had to change.

My plan was to disable mirroring during the maintenance jobs, letting the transactions pile up at the principal. Once the index maintenance was complete, I re-enabled mirroring. By this time, the maintenance jobs on the mirror server had also completed and it was able to handle the backlog of transactions that the principal started sending it as mirroring was synchronized. The only problem – alerts. Because I had alerts set up on both the principal and mirror for each of the five databases I was mirroring, both for when mirroring was suspended and re-enabled, I was getting bombarded with 20 alerts each night.

I needed to make my maintenance plans a little more sophisticated. I changed them to first, disable all alerts relating to mirroring on both the principal and the mirror, then suspend the mirroring sessions. Then the standard maintenance steps took place, followed by re-enabling mirroring and the mirroring alerts. I also took the extra steps of setting up some On Failure precedents so that if the tasks to disable the alerts failed, I would get an email telling me it failed and I could ignore the barrage of alerts that would be coming. (I included a caveat as a reminder to expect additional alerts about mirroring coming back online shortly thereafter. If I didn’t get those, something went wrong and would require my attention.) Similarly, if the alerts were successfully disabled, but NOT successfully re-enabled, I sent out an email telling me to make sure they got enabled again.

Here is a shot of part of my modified maintenance plan:

Notice the precedence constraints. In some tasks, I used On Completion constraints instead of On Success. If the alerts could not be disabled, I still want the maintenance job to continue. Likewise, if the mirroring sessions could not be resumed, I still want the mirror alerts enabled – to remind me mirroring is still down. There are a couple steps that aren’t shown, but they are just more of the same.

The Code Behind The Tasks

That’s a nice picture, but it doesn’t really show you the code that actually does the work. Here is the code to suspend mirroring:

ALTER DATABASE <database name> SET PARTNER SUSPEND

I have five databases mirrored, so my task has this command five times, once for each database. The task to re-enable mirroring is:

ALTER DATABASE <database name> SET PARTNER RESUME

Remember, the suspend command can be issued on either principal or the mirror, but the resume command can only be issued on the principal. In my case, this is all running on the principal.

To disable all my mirroring alerts, I use the sp_update_alert command. My alerts relating to mirroring are all named starting with “Mirroring”, so I made a cursor to select those alerts and disabled them:

USE msdb

DECLARE	@AlertName varchar(100)
DECLARE 	MirroringAlerts CURSOR FOR
	SELECT name FROM sysalerts WHERE name like 'mirroring%'

OPEN MirroringAlerts
FETCH NEXT FROM MirroringAlerts INTO @AlertName

WHILE @@fetch_status = 0
BEGIN
	EXECUTE sp_update_alert @name=@AlertName, @Enabled=0
	FETCH NEXT FROM MirroringAlerts INTO @AlertName
END

CLOSE MirroringAlerts
DEALLOCATE MirroringAlerts

The code to enable them again is the same except I set @Enabled=1 on line 12.

Your email notification tasks can say whatever you want, but mine say:

The mirroring alerts on <servername> were not able to be disabled as part of the maintenance job. The maintenance job pauses mirroring temporarily, which will cause these alerts to fire. If you subsequently receive alerts that mirroring has started synchronizing, you can ignore these “mirroring paused” alerts.

and

The mirroring alerts on <servername> were not able to be re-enabled as part of the maintenance job. The maintenance job disables the alerts and pauses mirroring temporarily, which will cause these alerts to fire. The alerts were not able to be re-enabled and you should manually enable them as soon as possible.

Now I’ve got some sophisticated maintenance plans that handle both mirroring and mirroring alerts!

Watch Out For That Denali!

No, not that one. This one!

In case you missed it, last week at the PASS Summit, Microsoft announced the release of the first community technology preview of the next version of SQL Server, code named Denali. It is available for download from Microsoft here. Unfortunately, as the sole DBA at my company, I don’t have the time or spare machines to try out alphas and betas of new software, but I have read about it and some of what I’ve read gets me excited for this version.

Microsoft has put the Denali Book Online up on the web, but, as would be expected at this stage of the game, documentation is sparse and not all functionality is present in the product. But here are a couple of the changes I think will be the most useful for me.

  • Contained databases. Finally, we are approaching truly portable databases. Right now, when you move a database from one server to another, be it via BACKUP / RESTORE or ATTACH / DETACH, there is a bunch of related stuff that doesn’t get transferred – jobs, roles, linked servers, and, of course, logins – which leads to the orphaned user issue that has plagues DBAs for years. A contained database is one that has no dependencies on the instance of SQL Server it is located on. You should just be able to take that database to any other SQL Server and everything will move with it. Aaron Bertrand states in his blog post about this that Denali will only support partially contained databases and support for fully contained databases will be in a future version of SQL Server. Disappointing, but at least it’s a step in the right direction!
  • HADR. Not the super collider. The super availability-er. HADR stands for High Availability and Disaster Recovery and is a new feature of Denali. The Books Online entry is here. This sounds like database mirroring taken to the next level – and with a whole new set of lingo. Instead of the principal and mirror servers that database mirroring uses, HADR uses availability groups, primary replicas, and secondary replicas. I can best explain this using the terminology of mirroring: An availability group is a set of databases that are (using the SQL 2008 terminology), mirrored and can failover as a group. (Nice!) Primary replicas are the “principal” databases in the availability group. Secondary replicas are the mirrored databases on the mirror servers. Ok, the plural servers is a bit of a wish on my part, but the BOL entry states “This CTP supports only a single, asynchronous secondary replica.” which implies multiple secondaries will be supported eventually (although maybe not in the final Denali release). And finally, the other cool thing about HADR is that the secondary replicas will be available for read-only access. Double nice! As you know, with database mirroring, the databases on the mirrored server are in a restoring state and not available for use by end users.
  • Lastly, DTS is history. Yup. No more support for DTS. Start migrating your DTS packages to SSIS now.

Those are just my first impressions based on the limited reading I have done. It’s been a busy week for me, so I haven’t had much time to delve deeper into this. But what I have seen so far excites me!

Data Export Tip

I often get asked to provide some data from SQL Server to someone in another department. These are typically one-off requests and the people asking for the data usually don’t have database experience, so I put the data into an Excel spreadsheet and email it to them. Occasionally, that person will come back to me sometime later, maybe weeks or months later, and either ask for an updated copy of the same data or want me to update those records in the database in some way.

These requests make me nervous because I rarely can remember the exact SELECT statement I used to get them the data in the first place. And if I then generate the data again and there are missing or additional records, they usually spot it and then we have to launch an investigation of why the data differs from before and which data do they really want, etc. Lately, I’ve started doing something that helps me avoid that. Now, when I send someone data in a spreadsheet, I add a worksheet to the file and copy and paste the SELECT statement I used into it. Most the time, the user never notices. And when they come back to me later, I simply ask them if they still have the original spreadsheet I sent them. They almost always do, so it’s easy to get my original SELECT statement back and guarantee that I’m pulling data based on exactly the same criteria as previously.

Checking For Internal Log File Fragmentation

One of the items on my checklist of things a DBA should check for when starting at a new company is internal log file fragmentation – log files that have too many virtual log files (VLFs) in them. (See Step 8 of this post for more info on virtual log files.) This is something I do as a background task. Every couple of months and whenever I have some spare time, I’ll check the log files on my servers for excessive amounts of VLFs. Unfortunately, spare time is becoming a rare commodity lately and I did not want this task to keep getting pushed back or neglected. I needed a way to automate this.

I came up with the script shown below. I created the stored procedure in the master database of all my 2005 and 2008 servers. Sorry, it won’t work on 2000 servers – it uses Database Mail and the sys.databases system table. You could rewrite it to use SQL Mail and sysdatabases instead, but I haven’t bothered to do this because we are on track to migrate our existing SQL 2000 servers to 2008 in a month or two anyway. I’ve scheduled a job to execute this procedure once a week. It will count how many virtual log files are in the transaction log for each database on the server and will send an email if the count is over a certain threshold. This script will not automate the cleaning up of VLF fragmentation, but it will do the checks for me and let me know which databases need attention.

USE master

CREATE PROCEDURE sp_VLFCheck

AS

/*

This procedure checks the transaction log of each database on the server and
emails a report if any contain more than a specified number of virtual log files
(as defined in the @MaxVLFs variable). Excessive VLFs can result in poor performance.

- Shaun J. Stuart

*/

DECLARE @SQLCmd varchar(40)
DECLARE	@DBName varchar(100)
DECLARE @DBID int
DECLARE	@MaxVLFs smallint
DECLARE	@VLFCount int
DECLARE	@EmailSubject varchar(255)
DECLARE	@EmailRecipients varchar(max)
DECLARE	@EmailBody varchar(max)

SET	@EmailSubject = 'Excessive VLFs found on ' + @@servername
SET	@EmailRecipients = '<enter email address here>'
SET	@EmailBody = 'Transaction log files with excessive VLFs have been found. '
SET @MaxVLFs = 50		/* threshold for number of VLFs */

DECLARE DBNameCursor CURSOR FOR
	SELECT	database_id
	FROM	sys.databases
	WHERE	state = 0			/* online databases only */

DECLARE @LogInfo TABLE (
	FileId tinyint,
	FileSize bigint,
	StartOffset bigint,
	FSeqNo int,
	Status tinyint,
	Parity tinyint,
	CreateLSN numeric(25,0)
)

IF  OBJECT_ID('tempdb..##ManyVLFsFound') IS NOT NULL
	DROP TABLE ##ManyVLFsFound

CREATE TABLE	##ManyVLFsFound (
	dbname varchar(100),
	NumOfVLFs int
)

OPEN DBNameCursor
FETCH NEXT FROM DBNameCursor INTO @DBID
WHILE @@fetch_status = 0
BEGIN

	SET @SQLCmd = 'DBCC LOGINFO (' + convert(varchar(5),@DBID) + ') WITH NO_INFOMSGS'
	INSERT INTO @LogInfo EXEC (@SQLCmd)

	SELECT @VLFCount = COUNT(*) FROM @LogInfo

	IF @VLFCount > @MaxVLFs
	BEGIN
		SELECT	@DBName = name FROM sys.databases WHERE database_id = @DBID

		INSERT INTO ##ManyVLFsFound
			VALUES	(@DBName,
					@VLFCount)
	END

	DELETE FROM @LogInfo
	FETCH NEXT FROM DBNameCursor INTO @DBID

END

CLOSE	DBNameCursor
DEALLOCATE	DBNameCursor

IF (SELECT COUNT(*) FROM ##ManyVLFsFound) > 0
BEGIN
	EXEC msdb.dbo.sp_send_dbmail	@recipients = @EmailRecipients,
						@subject = @EmailSubject,
						@body = @EmailBody,
						@query = 'SELECT NumOfVLFs, DBName FROM ##ManyVLFsFound'

END

DROP TABLE	##ManyVLFsFound

Enter the notification email address on line 27. Lines 26 and 28 contain the subject line and message text you want to include in your email. The email will also contain the results of a query that will list the database name and the number of VLFs that were found. The VLF count threshold is set in line 29. I’ve chosen a value of 50. Typically, you want to have no more than 20 or 30 VLFs for best performance. I set my threshold a bit higher, based on what I’ve seen on my servers.

This script assumes you have a public default mail profile set up in Database Mail (or a private one that is used by whatever account executes this stored procedure). It also assumes one log file per database. (This is all you should have. There is no performance increase to be gained by having multiple logfiles – source.)

The first time you run this script, you may see many databases with a large number of VLFs. This typically means your log file has autogrown many times. While you are cleaning up the VLFs, it would also be a good opportunity to check your autogrowth settings (I recommend setting it to grow by a fixed amount, not a percentage) and to size your log files appropriately so that autogrowth is kept to a minimum.

Note: I’m testing out a new plug-in to list code. If you just highlight the above text, right click, and choose Copy, you’ll get the line numbers included. Instead, hover your mouse over the code and use the icons that appear in the upper right portion of the code listing. The second icon will copy the code to your clipboard, but this method does not preserve the formatting. Try the first icon instead. This will open a new window with only the code, which you can then copy.