I ran into a situation with database mirroring today that, at first glance seemed to defy logic. I came into to work to discover some of my mirroring sessions had been paused since last night. That happened because one of my scheduled jobs, which paused mirroring, failed and the “resume mirroring” step never got executed. I’ve changed that job now to resume mirroring even if the job fails so I don’t run into this problem again. But as I resumed mirroring manually, I noticed something that seemed strange as I watched the Database Mirroring Monitor. My mirroring session was going to take about 4 hours to get caught up, which is normal. So I started it going and just glanced at the monitor now and then to make sure it was still chugging along. After an hour or so, I noticed that the field labeled Oldest Unsent Transaction was not decrementing. In fact, it was going up! Here are two screenshots of the monitor taken about 2 minutes apart:
Notice the Unsent Log amount has decreased, so transactions are clearly being sent to the mirror. Yet, the Oldest Unsent Transaction time has increased! WTF? Mirroring works by sending the transaction log records from the principal to the mirror server. These are, by definition, in chronological order, so the oldest records get sent first. Over time, as the mirroring process approaches the fully synchronized state, the Oldest Unsent Transaction age should drop. Why wasn’t I seeing this?
It wasn’t until the mirroring session was completely synched, that I realized what was going on. Here is a shot of the history that gave me my ah-ha moment:
Do you see it? In the history, at 11:19:23, the Oldest Unsent Transaction time suddenly started dropping, like I would expect. It was increasing prior to that time though. To determine what was going on, I had to put on my robot hat and started thinking like the computer. (Well, I didn’t need to, but I never miss an opportunity to put on my robot hat.)
The whole reason my job pauses mirroring in the first place was because it does some index maintenance that is very log-intensive. I am using synchronous mirroring, meaning changes have to be sent to the mirror server and committed there before they are committed on the principal. I had previously discovered that this dramatically slows down the principal, so I programmatically pause mirroring before the index maintenance and resume it afterward. (When you pause a mirroring session, it switches to asynchronous mode, so changes can be committed on the principal.) What was happening was, when I resumed mirroring, the transactions were being sent to the mirror server, but they were being sent slower than the original transactions occurred on the principal server – remember this is normal and is why I paused the mirroring session in the first place. Thus, even though the mirror server was catching up with the transactions, it was still falling behind in terms of time since the oldest unsent transaction. Once all the index maintenance transactions had been sent, the transactions for the normal daily operation starting being sent – at 11:19:23 – and the Oldest Unsent Transaction time starting dropping as expected.
Mystery solved!
Nice sleuthing work!