Issue:
Secondary database stuck in “Initializing / In Recovery” mode or “Reverting / In Recovery” mode after AlwaysOn failover.
Fix:
Wait until the secondary database has synced up with the database on the primary. This may take a while if there are pending transactions.
Recommend waiting for the secondary database to sync back up with the primary before doing anything with the server. Sometimes, the recovery can take hours if there are pending transactions it needs to roll back and the disk hardware is very slow.
Check the SQL Server database log for recovery status.
Always check open and running transactions before initiating a failover! Especially the ones that produces tons of logs such as index maintenance, big chunks of DML operations and so on.If you see such activity on availability database(s), either wait them to finish or stop them.
Well, there is no fix. If you already did failover and saw database in reverting state, there is nothing you can do other than waiting for the completion of rollback operation. Undo phase should be finished and this may take long time. In my case, it took 30 minutes to rollback all transactions and complete reverting stage . Only after than secondary replica became available and switched to “Synchronized” state. There will be 3 phases of secondary database replica state during undo process:
- Synchronization State: “NOT SYNCHRONIZING” ; Database State: ONLINE
- Synchronization State: “NOT SYNCHRONIZING” ; Database State: RECOVERING
- Synchronization State: “REVERTING” ; Database State: RECOVERING
CAUSE:
We performed a failover from Node 1 to Node 2. Failover is finished successfully and database is online now on Node 2. However, Node 1 did not switch to “Sychronized” state. Instead, it got stuck in Reverting/In Recovery state like in below figure.
According to MS documentation for dm_hadr_database_replica_states:
3 = Reverting. Indicates the phase in the undo process when a secondary database is actively getting pages from the primary database. Caution: When a database on a secondary replica is in the REVERTING state, forcing failover to the secondary replica leaves the database in a state in which it cannot be started as a primary database. Either the database will need to reconnect as a secondary database, or you will need to apply new log records from a log backup.
It basically tells that we initiated a failover from Node 1 to Node 2 while there was running transactions on Node 1 for long time and they were interrupted by failover. Naturally Node 1 have to complete rolling back those transactions before accept new pages from primary site and become available again. During this process, secondary will be unavailable to reports, read-only routing requests, backups etc.
Comments