Too Many Bits

Posted on September 13, 2010September 9, 2010 by Shaun

A commenter recently reminded me that, although I had mentioned that I attempted to do a migration from SQL 2000 to 2008 a while back, failed, and had to go back to the old version, I never really explained what happened. So now I will!

We had a SQL Server cluster running SQL 2000. One of the machines in the cluster failed. The other machine was old as well, so we decided to build a new cluster using new machines and running SQL Server 2008. This cluster is heavily used by our external customers and it was approaching our peak busy season. (We’re in the education field, so the August / September back to school season is our peak usage time.) We wanted to replace the cluster and move to the new version of SQL before the big rush started.

This cluster supports several applications. First and foremost, it supports our own propriety software that is used in schools. It also houses databases for our accounting software, our development bug-tracking software, and some internal and external websites. (I don’t like the setup. The server is used by both internal and external programs and I want to segregate those to different servers. It’s on my to-do list, but that was not within the scope of this move.) Our software development team had looked into migrating to SQL 2008 previously and, although they never did it, they didn’t think there would be any problems.

I did some research to see if I could identify other potential problems before the actual move. I downloaded the Microsoft SQL Server 2008 Upgrade Advisor and fed it some traces of typical SQL usage that I gathered. It came back with several possible problems, which I narrowed down to 4 that were what I considered to be the likeliest risks:

One database referenced the undocumented system table spt_values and the procedure sp_spaceused.
Full text indexing used the command sp_fulltext_service ‘load_os_resources’, 1.
Possible data type conflicts in one database.
Trace flag 3226 was in use.

Item 1 was a concern simply because system tables definitely changed between versions 2000 and 2008. We had other SQL 2008 servers already running and I checked those. I found spt_values still existed in SQL 2008. The warning about sp_spaceused was just a note that SQL 200 did not always store space allocation information correctly and that I should run DBCC UPDATEUSAGE after converting the database. Noted. There was some uncertainty still about the spt_values table. I wasn’t sure it contained the same data or columns as it did in SQL 2000, or was used in the same way. However, the database these warnings were from was for a Blackberry program and there was a newer version available that supported SQL 2008. If we ran into problems, we would upgrade to the newer version.

On closer inspection, Item 2 didn’t really seem to be as big a problem as I thought it might be. Be default, this feature it turned off, and this command was turning it on, which is why the Upgrade Advisor flagged it. According to this, we could run into problems if the verify_signature flag was set and our operating system binaries were not signed. That flag didn’t show up in the Advisor report, so I didn’t think it was being used and therefore, wouldn’t be a problem.

Item 3 could be an issue, but this was in a database that wasn’t really in use anyway. It was used by ScriptLogic’s Active Administrator, an Active Directory tool. Someone in our IT department had installed it to test it out and we were not currently using it. If it stopped working, it was no big deal.

Item 4 referred to a trace flag that stops the writing of backup completion messages to the Windows event log. I have this currently set on our other SQL 2008 machines, so I knew this would not cause a problem.

One last thing I needed to do was to make sure I could transfer the logins from the old server to the new server. We had a couple SQL server logins that were in use and the passwords had been lost, or at least were not readily available, so I would not be able to recreate them. I wrote about how I transferred these logins previously. I also had to set up replication, which I also wrote about previously.

So from my research, it looked like we should be ok to move to SQL 2008 and we set a date for the move. We had most of our IT department involved. As the database guy, I was definitely involved. Our network guy and system admin guys were involved because, unfortunately, our company’s legacy software that used this database server had some IP addresses hard coded, so we needed to put in some DNS aliases to redirect to the new server once the database had been transferred. We also had several members our of QA department lined up to put the new system through their testing and make sure things still worked.

The weekend of the migration came. We started on a Friday night. We first put our websites into a maintenance mode so no one could access the databases. I made backups of all the databases, then copied those backups to the new server, then shut the old cluster down. There was a large amount of data to copy, so it took a couple of hours to move everything to the new server. Then I restored the backups to the new server. Everything went well. I ran DBCC UPDATEUSAGE on all the databases. It found lots of errors, but fixed them all. The DNS redirects were put in place and, after waiting a bit for that info to propagate through the network, our QA team started testing. It was Saturday morning at this point.

We ran into a couple of problems but they mostly involved the DNS redirects. We had to restart some of our web servers so that they got the new DNS information. All in all, things were going surprisingly well. While QA was testing our software, I started working on getting replication back up and running. I already wrote about how that went, so you know that took several hours of my time. But things were looking good overall. All our third party programs seemed to be functioning correctly with the new server. Things were looking up.

Then Saturday night rolled around and QA found a problem. Our company’s product allows customers to upload data from ASCII files into our database. That was failing. We were getting an error like this:

Cannon initialize the data source object of OLE DB provider “MSDASQL for linked server “<servername>”. OLE DB provider “MSDASQL” for linked server, “<servername>” returned message “[Microsoft][ODBC Driver manager] Data source name not found and no default driver specified”. (Microsoft SQL Server, Error: 7303)

Hmm.. Linked servers? I didn’t think we had any of those on the old cluster. It was offline now and we couldn’t bring it back up to check because we’d get IP conflicts. So I made some linked servers to the other servers I knew had any sort of interaction with this server. The error persisted. We spent several hours chasing down possible problems but nothing seemed to work. It was very late at this point, early morning actually, and we decided to call it a night and get some rest. Hopefully, a little sleep would help us find something we were missing.

Sunday morning came. We gave upper management a status update. We had all functionality working except this one upload routine. We were given a 1 PM deadline. If we couldn’t get it working by then, we had to roll back to the old server. At least rolling back would take very little time. It pretty much only involved removing the DNS redirects and bringing the old server back online. But I still didn’t have any idea what the problem was.

As is so often the case, the text of the error message was leading us down the wrong path. It turned out the problem had nothing to do with linked servers, at least, not the linked servers any DBA would think of.

We spent a couple more hours spinning our wheels and getting nowhere. I starting thinking about all the things that were different between the two servers. The version of SQL Server, obviously. The hardware, but that shouldn’t be an issue. The OS was a newer version of Windows Server. Hmm. And the new cluster and new SQL were 64 bit while the old one was 32 bit. That was my only lead at this point, so started investigating that.

And it turned out, that was where the problem was. The upload routine that was failing was called from a web page. The web server serving up that page was a 32 bit machine. I found this post with a Microsoft employee saying that 64 bit SQL Server can only use a 64 bit ODBC driver. The one installed on the web server was 32 bit. The only solution I could think of at the time was to upgrade the web server to 64 bit. That was not something we were prepared to do, so that was out. We were rapidly approaching out 1 PM deadline.

At this point, I thought we were out of options. I assumed that the upload process was driven by some code running somewhere outside the database. Then one of my colleagues said something that got me thinking – maybe a stored procedure was involved. So I started hunting around and, sure enough, I found 5 stored procedures that were used in the upload process! What’s more, those procedures all used the OPEN ROWSET command! And MSDN says this about OPEN ROWSET: “This method is an alternative to accessing tables in a linked server…” Linked server?! Bingo!! Even though we were importing from an ASCII file, OPEN ROWSET typically deals with linked servers, which is why the error message referenced linked servers! And it was in a stored procedure, which I could change, not program code somewhere that I couldn’t!

At this point it was about 12:15 PM. I was sure I had found the problem, but time was running out. I communicated my thoughts to the rest of the team. Could I change the stored procedure to not use OPEN ROWSET? I tried on one, re-writing the procedure to use BULK INSERT instead. This was a native T-SQL command and shouldn’t have any ODBC issues. Once I made the change, I had QA test again. It failed again, but this time on a different file – meaning it got through the one stored procedure I changed and hit one of the other four that I hadn’t. We were very close to 1 PM now and I quickly changed the remaining procedures to use BULK INSERT. QA tested again and it worked! Cheers all around!!

A short time later, we got an email from QA. Another test failed. Our spirits plummeted.

It turns out we allow our customers to import files with or without some optional fields. BULK INSERT can’t handle this and this was the cause of the latest failure. It needs to have files in the exact same format each time. We were stuck and there was no magic T-SQL code I could pull out of my hat to fix this one. But I did feel this was a change that our software department could make fairly quickly on their end – simply format the file to a fixed format before calling the stored procedure. It was about a bit after 1 PM by this time, so we put another call in to upper management. Everything worked except this one case. We gave the reason for the failure, as well as my opinion that a fix from software engineering would be relatively easy to make. (As a former database developer, I felt I had a reasonable idea of the work involved in the change.) Should we stay on the new server with 99.9% of our functionality and give software engineering a day or two to develop a fix that would bring us to 100% or should we roll back to the old server (which, we pointed out, was no longer part of a cluster, leaving us exposed to another hardware failure that would cripple us). The answer came back: roll back to the old server. It was not a decision I agreed with, but one I had to work with.

We spent another fairly depressing hour moving everything back to the old server. As expected, that part went pretty easily. We were all disheartened to have some so close, only to have to turn back.

The next day, Monday, I wrote up a summary of what we found for my boss and included an executive summary that presented the key issues. The good news was that BULK INSERT is not a 2008-specific command, so our software team could develop the fix and implement on the current 2000 server at any time. Then, when we tried the migration again, we wouldn’t run into this issue and things should go smoothly. As it stands at the time of this writing, we are still waiting for software development to make the change. We are well into our busy season now, so we’ll have to wait until we are through that before we try the migration again.

All in all, I think we were remarkably successful in everything we tried to do. We had over a dozen programs that used this database server and we got 99.9% of them working on the new cluster before we had to turn back. I still can’t believe Microsoft would allow this 64-bit limitation to have gone out the door. Surely, they can’t think companies would upgrade all their machines to 64-bit at the same time? Nevertheless, this was a good example that, no matter how well you might prepare, there is usually something you’ll miss that can mess you up and it will likely come from a direction you don’t expect. If you are able to quickly and accurately diagnose the problem, you’ll stand a better chance of fixing it during your maintenance window and continue moving forward.

Update: After writing this, our software development team looked into the problem and came up with a different and simpler solution. We need to install the Microsoft Access Database Engine Redistributable driver (found here) and change the stored procedures to use this driver instead of the MSDASQL driver they were using. When you install the Access driver, you will see a new Provider in SSMS:

(If you are doing this on a cluster, be sure to install the driver on all machines in the cluster.) Once that is installed, we had to make two changes to the server. First, we need to enable the new Provider. Right click and provider name and choose Properties. Put a check in the Allow inprocess checkbox:

The second change was to run sp_configure and enable “Ad Hoc Distributed Queries”.

Finally, the stored procedures that were using the OPEN ROWSET command needed to be changed to use this new driver. The old statement was:

SELECT *
FROM OPENROWSET(‘MSDASQL’, ‘Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=c:\temp\;’,’select * from import.csv’)

The new statement is:

SELECT *
FROM OPENROWSET(‘Microsoft.ACE.OLEDB.12.0’, ‘Text;Database=C:\temp\;’,’select * from import.csv’)

There is one more issue you may encounter, depending on what account your SQL Server is running under and how you have permissions set up on your machine. When I tried to run the above statement, I was getting this error:

OLE DB provider “Microsoft.ACE.OLEDB.12.0” for linked server “(null)” returned message “Unspecified error”.
Msg 7303, Level 16, State 1, Line 1
Cannot initialize the data source object of OLE DB provider “Microsoft.ACE.OLEDB.12.0” for linked server “(null)”.

Believe it or not, this is a file permissions issue and it has nothing to do with the location of the file you are trying to import. What is happening is some temp files are made during the import and it is the location of these temp files that is causing the problem. The account that is executing the OPEN ROWSET command needs to have access to the user temp directory of the account SQL Server is running under. This post led me to this solution. The actual directory in my case (Windows Server2008 R2) was C:\Users\<SQL Service account name>\AppData\Local\Temp. I granted full access to everyone on this directory and my query worked. Again, if you are setting this up on a cluster, be sure to set the permissions on all machines in the cluster.

Phew. Talk about going down the rabbit hole!

We have not yet tried the entire migration again, but we have installed the above driver on our new cluster and have successfully loaded a test file, so it looks like this fix will work.

Taking One For The Team (City)

Posted on September 6, 2010August 25, 2010 by Shaun

The developers at my company use a product called Team City to run their build server – a server that compiles source code into an installable package for testing, release, etc. Team City uses a database backend to keep track of things. I don’t have much experience with it, but it seems pretty flexible. We were initially using a MySQL database server to host the database and we decided to migrate this to SQL Server 2008. As part of the migration process, the development team was also upgrading the version of Team City. I ran into some interesting situations while working on this project and thought I’d share them in case anyone else has to do something similar.

The migration process seemed fairly straightforward – simply transfer the tables from MySQL to SQL Server, point Team City to the new server, and let it go. As usual, things are never quite as simple as they seem.

MySQL has a fairly simple way of creating backups. A backup of a MySQL database is simply a huge text file full of SQL commands – first commands to create the tables, then tons of INSERT statements to insert the data into the tables. Not very elegant or compact, but it works. Unfortunately, the SQL syntax used by MySQL is slightly different than T-SQL – most notably in how it defines primary keys, indexes, and data types. So my first step in the migration was to look through the MySQL backup file (which was about 300 MB), find the tables definitions, and recreate the tables in SQL Server.

Once I had the tables created, I created an SSIS package that connected to the MySQL database and transferred the data. In order to do this, I needed to install the MySQL Connector, available at http://dev.mysql.com/downloads/connector/net/. This allowed me to create a connection to the MySQL database in my SSIS package. My package was a simple one – I simply set up a straight data copy from the MySQL tables to the SQL Server tables. The database contained 56 tables, so I made 8 data flow tasks, each containing 7 tasks to transfer data from the source table to the destination table. I ran the package and the data transferred.

My last step was to create a login for Team City to use. That was fairly easy. Everything looked fine on my end.

Then the development team tried to run the Team City upgrade. It failed.

I’ll spare you all the troubleshooting steps and just give you the list of things I discovered.

The upgrade process modified the tables. Columns were added and deleted. Indexes were modified and created. This lead to two discoveries. First, when re-running the import, I could not just clear the data out of the tables and reload. I had to actually drop all the tables and re-run the script that created them. Second, index names mattered, as the upgrade called some out by name to modify them.
Only some of the tables in MySQL had an explicitly defined primary key. Several had a unique, clustered index on non-nullable columns, which MySQL will use as a primary key when one isn’t explicitly defined. When I recreated the indexes in SQL Server, I initially created these as primary keys. I suspect, but am not certain, this caused a problem. The problem was likely related to the previously mentioned naming issue, but just to be safe, I changed my scripts so that the indexes were created as unique, clustered indexes.
The account Team City uses to access the database needs dbo rights.

Item 3 was really tricky and the Team City upgrade process gave some misleading error messages. One in particular was:

Error on step: Maintaining the database schema…

Cannot perform upgrade of configuration data while database is empty. Configuration files version (222) does not match current TeamCity data version (378).

Now, when I see this, I think two things: 1) when a program tells me its database is “empty,” it means the tables in the database have no data in them and 2) if the upgrade was able to determine a version number, it has to be reading that from the database somewhere. It turns out, both of those thoughts were incorrect. I was never able to determine where Team City was getting the data version number from. I know there are some configuration files on the Team City server, but the developer’s swore that version info wasn’t in them, and I searched all the database tables for a version number and didn’t find one. As for item 1 – it turns out the upgrade program was not seeing any tables because it did not have dbo rights (I created the tables as dbo). Typically in cases like this, where a table can’t be found, I would have expected an error along the lines of “invalid object name <tablename>.” While that’s not a very user-friendly error message, at least it points me in the right direction for fixing it. But this nicely phrased error lead me down a wrong path. I applaud the software’s attempt to create user friendly error messages, but they also have to be useful for diagnosing the actual problem. Or maybe it’s just me and a quirk of how I interpreted “empty.” Either way, once we got these problems straightened out, the migration and upgrade went smoothly.

Update: I write these posts in advance and, a week after I wrote this one, Microsoft announced the release of their SQL Server Migration Assistant for MySQL v1.0. I have not looked into this as yet, but it probably would have saved me a lot of time and effort.

SQL Server Migration Assistant for MySQL v1.0

Pop Goes The DBA

Posted on August 30, 2010August 16, 2010 by Shaun

I recently spent an entire weekend working on a database server migration project. It did not go well and we ended up rolling back to the old server and rescheduling the migration for another day. The attempt took from Friday evening to Sunday morning and, with all the troubleshooting we had to do, it was mentally exhausting. I went home Sunday night, slept for 6 hours, and headed back in Monday morning to make sure everything was still working. As I was driving in, I realized just how drained I was. It wasn’t just a lack of sleep. Mentally, I was done in. I felt like I couldn’t look at SQL Server again. I needed to recharge.

So I turned on my iPod, cranked up the volume, and queued up some pop music. If you will allow me another example of my über-geekiness, as the songs played, I had a mental image of a health bar floating over my head, near empty but quickly refilling as the music worked its magic.

Pop music does that for me. For some, it’s rock music or chocolate or movies or biking or running or something else. For me, it’s those sticky sweet songs with the catchy hooks that infect your brain and leave you humming the song for the next three days. And now, dear reader, it’s time I make a little confession. Don’t tell anyone else. This is just between you and me. I’m a Deborah née Debbie Gibson fan. Have been since her first album in 1988. It’s loaded with catchy little pop nuggets that sucked me in instantly. But it’s not only her songs I like. I’m a pushover for any catchy pop gem.

What is it about pop music that I find so attractive and so rejuvenating? Sure, the music is catchy. But the lyrics are what do it for me. The best pop songs are about love – first love or lost love or unrequited love. They capture those glorious moments that everyone in life has had at some point: the exhilaration of a new relationship, the thrill of your first kiss, the way the world seems so much brighter when you’re together. They also capture the pain: the devastation you feel after a breakup, the certain conviction that you’ll never love again, the unending, bottomless, crushing, heart-rending pain. Those visceral feelings are typically forever bound up in a time from your youth – a time when you were just starting to date and trying to figure out what love was and what it meant. The golden summers of your adolescence when you hung out with your friends in the sun, woke up late and stayed out later, had crushes and got crushed. All those feelings are there, woven into those songs and standing as a reminder that you are not alone and you are not the only one who has gone through the rollercoaster of love. You are part of a communion of humans, all searching for love, at times finding it, at times losing it.

And what those songs tell you, ultimately, is to never give up. Finding love is always worth the pain of the search. Always.

THAT is what rejuvenates me. What about you?

Shake your love.

Transferring Logins From SQL 2000 To 2008

Posted on August 23, 2010August 9, 2010 by Shaun

During a recent migration from SQL 2000 to SQL 2008, I needed to move logins from the old server to the new server. We use mixed mode authentication, so we had both Windows and SQL logins that needed to be moved.

My first attempt was using SSIS. I created a package on the 2008 server and used the Transfer Logins task to move the logins to the new server. Piece of cake. I set it up, made sure to specify that the SIDs needed to be transferred as well, ran it and it worked fine. Then I went to test the logins and ran into a problem. All the Windows logins worked, but the SQL logins did not. Turns out, BOL says the Transfer Logins task will transfer SQL logins, but it changes their password to a random string – and doesn’t tell you what that string is. I’m sure it does this as a security precaution, but really, this makes this task pretty much worthless, in my opinion. If I need to go and manually change the passwords of all the SQL logins I just transferred, I may as well create them from scratch in the first place. Furthermore, many companies, like mine, have programs using SQL logins whose passwords have long been forgotten. Sure, I could hunt down the programs using them and try to find the configuration files that likely have the password in them, but that’s a time consuming process. And as a DBA, my area of knowledge doesn’t extend to configuring dozens of programs, each of which might store its configuration data in a different way, so I would need to pull in other resources in I wanted to go this route.

Luckily, there is another way. Microsoft has supplied some scripts to transfer logins, namely sp_hexadecimal and sp_help_revlogin. I created and ran these procedures on the 2000 server, executed the output on the 2008 server and… the logins still didn’t work. Hmm. A Google search showed lots of people were experiencing this same problem when they tried to move logins between SQL 2000 and 2008. Some claimed the two procedures mentioned before worked, but they didn’t for me.

It turns out, I just needed to read a little more carefully and think a little bit. SQL Server changed the way it hashed passwords between versions, so going straight from 2000 and 2008 won’t work. But, as they say, there is more than one way to skin a cat.* Method 2 of this Microsoft KB article details how to transfer a login from SQL 2000 to SQL 2005. And this Microsoft KB article details how to transfer a login from SQL 2005 to SQL 2008. I happened to have a SQL 2005 server hanging around, so I transferred the logins to there, then transferred them from there to the 2008 server. That worked!

If you decide to do this, there are a couple of things to watch out for. The script will set logins to have the database roles and default databases they have on their original server. Those might not exist on the intermediate server, or even the final server, so you’ll want to examine the output scripts of the procedures and remove any references to databases or roles that do not exist on the destination server. And watch out for logins that might already exist on the intermediate server. To keep things tidy and mitigate security risks, be sure to delete the logins from the intermediate server after you are done.

* At what point in history was cat-skinning so popular that more than one method was needed?

Checking In With Replication

Posted on August 16, 2010August 9, 2010 by Shaun

The big replication project that I spent a couple weeks planning and testing for arrived last weekend. To recap, I needed to set replication of a rather large database with SQL 2000 as the publisher and distributor and a new SQL 2008 cluster as the subscriber. As I wrote previously, I had tested this out fairly thoroughly on a test system and was pretty sure everything would go smoothly. Of course, that rarely happens.

To start the project, I got out my checklist. You do use checklists, don’t you? There’s a good reason why airline pilots use them. When something is pretty complicated, the chances of forgetting a step are pretty high. So when I was testing, I wrote a step by step checklist of everything that needed to be done, in the order it needed to be done. When the time came to set this up for real, I went through my checklist without any problems. Looking good! Then I tested the replication.

Whenever I set up replication, I always make a small test table and include it in the publication. That way, I can do inserts, delete, and updates in that table and verify replication is functioning. It’s also a good way to show others that replication is functioning. Rather than just show someone an icon that says a job is running, show them that you can make a change in a table on one server and, voilà, the change appears on the other server. I usually name my test table aaasjs, aaa so it shows up at the top of an alphabetical list and sjs, my initials, so I know this table is used by me only and I can delete or alter it at will. So I made a change in my test table on the publisher and looked to see if it was replicated to the subscriber. Nope. Checked the distribution agent on the publisher and saw it had failed with this error: Could not find the stored procedure ‘sp_MSins_aaasjs’.

Hmm.. I had run this whole process through about 5 times during my testing and I was pretty sure the process worked ok. I even tested using a copy of the actual database being replicated, so I didn’t expect any surprises from weird data or tables. Now, I was initializing replication using a backup copy of the source database. This was because the database was simply too large to send a snapshot over the network for initialization in a reasonable amount of time. But still, I tested this multiple times and it worked. The one difference was that my testing was done with the subscriber being SQL 2008 and here I was actually using SQL 2008 R2 CU1. I didn’t think that was the issue though.

I won’t rehash all the steps in the process, as I outlined them in detail here. But just to make sure I didn’t make a mistake somewhere, I re-did the whole procedure from scratch. Same results. I decided to take a closer look at the error the distribution agent was throwing.

If you didn’t know, SQL Server implements transactional replication by, in part, using stored procedures. When you set up replication, SQL Server creates three stored procedures for each table that is being replication. One procedure handles data inserts, one handles data deletions, and one handles updates. These procedures are named sp_MSins_tablename, sp_MSdel_tablename, and sp_MSupd_tablename, where tablename is the name of the table that is being replicated. These procedures are created on the subscriber in the replicated database. My error message was telling me the insert stored procedure for table aaasjs, which is my test table, was missing. I looked at the stored procedures and found the update and delete procedures for that table were present, as were the insert procedures for other tables. Something strange was going on.

Because I was initializing from a database backup, part of the setup process involves running scripts that create all these procedures. I look at the script that creates the insert tables. (This is Step 8 in the Details section of my previous post and the step marked with an asterisk in my checklist above.) Notice that, due to a bug in SQL Server, this script is expected to generate errors when it is run. Specifically, it will generate one error for every table that has an identity column in it. The fix for this is to run a second script that makes update procedures that replace the ones that generated the errors in the step before.

I took a closer look at the errors this script generated. Most were the expected error: “cannot update identity column…” But interspersed with those, I saw a couple errors that said “column not found: msrepl_tran_version” Wow.. That’s a unique identifier column that SQL will add to tables that are being replicated when queued updating is enabled. This is where things get really weird. The tables in my publication database all had this column. Some of the tables (about 40 out of 780) in my subscriber database did not! This despite the fact that the subscriber database was created from a backup of the publisher database! I was doing this work on a Saturday and the new SQL Server wasn’t yet available to others, so I know I was the only one using it. So either some process dropped those columns or something I don’t want to think about was going on. (I did run DBCC UPDATEUSAGE on the subscriber database and it fixed some errors, but I don’t think that would cause columns to drop. More on this function later.)

I didn’t really want to try starting from scratch again. As I said this was a huge database. It took about 30 minutes to copy the compressed backup over the network from the publisher to the subscriber, plus about 30 or so to restore it. I was working on a Saturday afternoon and I didn’t want to stay any longer than I had to. Since the list of tables missing the columns was fairly short, I decided to go ahead and manually add the column to those tables. After that was done, I re-ran the scripts to generate the replication procedures, and my tests then worked.

My workload right now doesn’t permit me time to go back and investigate how those columns got dropped from some tables. I had time enough only to fix the errors and continue on with the process. (Unfortunately, other portions of the migration, unrelated to databases, failed and we had to roll the whole thing back and return to the old server.) But this is a great example of where knowing how SQL Server works can be a big help. If I didn’t know that SQL uses three procedures per table to implement transactional replication, I would have been in a world of hurt. Additionally, my use of checklists enabled me to easily start over and to back out my changes without fear of missing a step somewhere.

Shaun J Stuart

Just another SQL Server weblog