Whew. I’ve spent the last week testing and retesting how to setup transactional replication using a manually initialized subscriber database. There’s a lot of information on the internet about this, but it seems each site I found left out one important piece of information. Hopefully, this piece will bring it all together. This piece draws heavily from this Microsoft knowledge base article.
We currently have a 9 GB database that is being replicated to a server in another city. The data link between the two servers isn’t very fast. The last time I had to set up replication, I did it using the replication wizard and it took about 72 hours to send the initial snapshot. That was not a fun weekend. Well, for various reasons, we have to replace the SQL cluster the subscriber is hosted on. We’re also taking the opportunity to move from SQL 2000 to SQL 2008. (Currently, both the publisher and subscriber are at SQL 2000.) Needless to say, I don’t want to spend another 72 hours watching data trickle across the network, so I began investigating how to set up replication without using the snapshot agent. Seems simple enough – just take a backup of the database, restore it on the subscriber, and go. Right? Wrong. It turns out there are tons of behind-the-scenes things the replication wizard does that have to be taken care of manually for this to work. But I’m getting ahead of myself.
Even though we still have the slow network link between the sites, using a backup for replication initialization saves time because I am able to use a compression utility to shrink the size of the backup file before sending it across the network, instead of sending uncompressed data like the snapshot agent does. In my case, the amount of data to be transferred shrunk from 9 GB to about 950 MB. (I should also point out, using third party backup compression tools was ruled out for budgetary reasons, so this process had to be a home-grown, inexpensive method.)
This is going to be a fairly long post with many steps, so I’ll start by first talking about what needs to be done in general terms and then give the details of how I did it. Also, make note of where I mention to be on the look out. Some of the steps I do might break your system. They work on mine, but they could cause problems for you. I will try to make it very clear when I am about to do something like this.
They don’t call it a wizard for nothing. Like Oz, The Great and Terrible, the replication wizard does a lot of lever pulling behind the scenes. To configure replication using a backup as the initializing snapshot, we’ll have to pull those levers ourselves. What needs to be done? There are two big gotchas that I ran into. First, timestamp columns in the publisher database must be converted to binary (or varbinary) columns in the subscriber database. This is non-trivial because you cannot simply issue an ALTER TABLE tablename ALTER COLUMN command to do the conversion. SQL does not allow an implicit conversion between these two data types. My solution was, on the subscriber database, to add another column to the table of type varbinary, copy the timestamp data into that column, drop the timestamp column, and rename the varbinary column to the same name as the dropped timestamp column. The database I am working with has over 700 tables, each with a timestamp column, so obviously I have to have a way to make this change programmatically rather than manually. This is also one of those instances I mentioned that could break your setup. The code I use to change the column type adds a column and deletes another column. If your code and/or stored procedures reference columns by name, you’ll be ok. But if they reference columns by ordinal position, they will be broken. This is because the new binary column that replaces the timestamp column will be the last column in the table instead of wherever it was before. And of course, all columns after that deleted column will have a different position as well.
The second gotcha is, at the publisher database, any identity columns should be set to “Identity (not for replication).” Otherwise, you’ll get errors about not being able to insert values into identity columns. Again, I have code to change this programmatically. This shouldn’t cause problems for other code, but test it anyway before trying it on a production system.
There are other steps that the wizard does that we have to manually perform. Replication works, basically, by creating three stored procedures for each table that is published. One procedure runs when a row is inserted into the table, one runs when a row is deleted from the table, and one runs when a row is updated in a table. We need to manually create these procedures. Luckily, this is not hard to do and Microsoft provides a procedure to do just that. However, there is another pitfall to watch out for here. Prior to SQL 2000 SP3, the procedures Microsoft supplied did not correctly create the stored procedure used for table updates. This was fixed in service pack 3 for SQL 2000. Therefore, your servers need to be at SQL 2000 SP3 or greater for replication to work.
There is one additional thing to watch out for that may or may not apply to your situation. During testing, I discovered the database I was using had a trigger on one of the tables that was being fired when I ran my code to covert the timestamp columns to binary and the trigger was generating errors. I needed to disable this trigger before running that code, and re-enable it after.
So the general steps for setting up replication and manually initializing the snapshot are:
- Backup your publisher database (in case something goes wrong)
- Convert the Identity columns to Identity (not for replication) at the publisher
- Backup your publisher database (this will be your initial snapshot at the subscriber)
- Create the replication publication at the publisher
- Restore the backup from step 4 at the subscriber
- Create the subscription at the publisher
- Change timestamp columns to binary or varbinary at subscriber
- Create the replication stored procedures (code generated at publisher and executed on subscriber)
- Start distribution agent (twice – details below)
- Test to verify replication is working
It should go without saying, but I will say it anyway. While this process is going on, no one should be using the database. If transactions happen between the time the backup in step 4 is made and the replication is functioning, those changes will not make it to the subscriber. So do this after hours during a maintenance window. Also, these steps assume the distributor is the same server as the publisher.
Also – and this is very important – test this whole process first! I spent a week running through the entire process on test servers. My setup has a bit more complexity as I am replicating from a 2000 server to a 2008 server, which caused some minor changes in the steps above. Those changes would not have been caught had I not tested this. Remember, your job as a DBA is to keep the data safe, not to take chances with it. I did several complete run-throughs of this process before doing it for production. I also had some people test my test results to make sure they could still do what they needed to do with the resulting subscriber database. It is time well spent.
Now that you know the general steps we have to do, it’s time for the nitty gritty.
Step 1 – Backup your publisher database. Stash this backup somewhere safe just in case things go wrong.
Step 2 – Convert identity columns. On the publisher, execute the following code in your database to change identity columns to identity (not for replication):
Depending on how your server is configured, you may or may not need the beginning and ending SP_CONFIGURE statements. The first one sets your system to allow direct modifications to system tables and the last one disables that. The statement in between came from the MSSQL Tips community. Note that this code will not work on SQL 2005 or greater. Direct modification of system tables is prohibited in those versions. The article linked to shows how to do this for 2005 and greater though.
Step 3 – Backup the publisher database. This is where you create the snapshot to be used for your subscriber initialization. Needless, to say, this should be a file named differently than the file created in step 1. Once the backup is created, compress it for transport, if you need to.
Step 4 – Create the subscription on the publisher. Use the wizard to select which tables, stored procedures, and views you want published. In my case, I just need all the tables published. I am also using transactional replication, so that is the option I choose. Note that following these steps will most likely NOT work if you want merge replication.
Step 5 – Restore the database from step 4. Restore this on the subscriber server.
Step 6 – Create the subscription. We use a push subscription and I would normally set up the subscription on the publisher (which is also the distriubtor in my case). However, because my subscriber is a SQL 2008 machine and the pubsliher / distributor is a SQL 2000 machine, I cannot do this. Why? Because SQL 2000’s Enterprise Manager cannot connect to a SQL 2008 machine. So instead, I must create the subscription using SSMS on the subscriber. However, during configuration, I will tell the system to create the subscription on the distributor. Also set the following options: Run on demand only (this will prevent the distribution agent from starting until we are ready) and Do not initialize (that’s the whole point of this exercise – we’re handling the initialization).
Step 7 – Convert timestamp columns to varbinary(8) at subscriber. Here is the code for that:
Yeah, I know. It uses a cursor. Boo hiss. Shame on me. But I’m running this code once, during a maintenance window when no one else is using the system. Performance is not an issue for me in this case. The print statements are commented out and were used for testing. Uncomment those and comment the preceding EXEC statements to see the commands the code will issued without making any changes.
Step 8 – Create the replication stored procedures. On the publisher, run the following command:
This will create the insert, update, and delete stored procedures that SQL Server uses to implement replication. Before running this command, make sure you have Query Analyzer configured to allow a large number of characters per row – I use 8,192, which I believe is the max. This setting can be found at Tools –> Options –> Results tab. If you don’t do this, it is likely lines will be truncated and your stored procedures will not be created due to syntax errors. Take the output from this command and execute it at the subscriber in the subscription database.
Note: When you execute the output at the subscriber, you will get error messages. Specifically, you will get one error message for each table that has an identity column and the error will be ‘Cannot update identity column <columnname>’. Remember what I said earlier about needing 2000 SP3? This is why. We will take care of this right now.
Run the following code in the publisher database:
Another cursor!! Boo hiss!! I know.. My same excuses apply here. The errors you saw when you ran the output from the last code where caused by the update stored procedures SQL Server tried to make. This set executes a stored procedure that was included in 2000 SP3 that correctly creates the update procedures. Unfortunately, it needs to be run individually for each replicated table. With a large number of replicated tables, I needed a way to programmatically do this. I could have used the undocumented sp_msForEachTable procedure, but it’s undocumented for a reason. It could go away at any time and if I’m taking the time to blog about this, I want a method that isn’t going to break down the road. Hopefully.
So this code looks at the system replication table to get a list of tables that are being replicated and executes the new and improved Microsoft stored procedure against each one. You might wonder what those COLLATE DATABASE_DEFAULT phrases are doing in there. Well, it just so happens that my two servers use different collations. (See, test test test!!) Thanks to Pinal Dave for his post on how to do this. Note too that this will only look at tables that are being replicated. If you are replicating views, you’ll need to modify the cursor. This is an exercise left for the reader.
Unfortunately, this code returns our statements as several result sets – meaning the column header “procedure_text” is included in many places. Copy the output from this command and paste it into your favorite editor. Do a find a replace for the string “procedure_text” and replace it with nothing. Then execute the resulting code on the subscriber database.
Step 9 – Start the distribution agent (twice). When we defined the subscription, we told SQL to only run the distribution agent on demand. This prevented the agent from trying to run until we finished our database restore and all the other stuff we had to do. Now, we are ready for it. So, on the distributor (or publisher, if it is the same machine), expand the Replication Monitor folder, then expand the Agents folder. Click on Distribution Agents and right click the agent for this subscription. Choose Start Agent. Refresh the view a couple of times and you’ll see the last Action column cycle through some states, eventually ending at “No replicated transactions are available” and the agent will have stopped. Now we want to set the agent to run continuously, so right click the agent again and choose Agent Properties. This will bring up the job details screen for the agent. Click the Schedules tab and click the Edit button. Change the setting from “One time” to “Start automatically when SQL Server Agent starts.” Click OK twice to return to the replication monitor screen. Now start the agent again by right clicking on it and selecting Start Agent. It should stay running this time.
Step 10 – Verify replication is working. Make some test transactions on the publisher and make sure they get propagated to the subscriber. Because the replication process involves three different stored procedures (remember – one for inserts, one for updates, and one for deletes), I like to perform one of each type of action. In this particular case, I also tested using a table that has an identity column and a timestamp column since we had to jump through some hoops to handle those correctly. If something went wrong and replication failed for some reason, the distribution agent will have stopped. You can check the error history for the agent to see what the problem is.
Phew. Lots of work for this. The good news however, is that SQL 2008 has built in support for initializing replication using a backup!!
Now it’s time for some more caveats. I already mentioned that this method might cause problems with code that refers to columns by ordinal position. I would also like to point out that this method completely ignores schemas. If your database makes use of them, this method may not work or may need to be heavily tweaked. And finally, my code to change the timestamp columns to varbinary will likely supersize your transaction log if you have lots of data. You may want to manually take a transaction log backup after this is complete.
As I mentioned several times, testing is very important. I initially made a small test database to work out this procedure on and it worked fine. Then I used a copy of the actual data and found more problems and situations that needed to be addressed. That proved invaluable and saved lots of downtime. Better to run into problems during testing, when there is relatively little pressure, than to encounter them during the live migration when downtime has to be minimized.
And finally, the real savings in all this: time. When we would have to reinitialize a subscription to this database before, it took about 72 hours. This method takes less than 4. These steps not only helped during the migration to a new cluster, but will also help if our network connection between sites goes down for an extended period and the subscription expires, requiring re-initialization.