Status:
- Windows Failover Cluster fail to start any of the Roles
- All nodes in the Windows Failover Cluster are up.
- Identified several important event IDs that are common across all SQL nodes: 1069, 1205, 1254.
Probable Causes:
-SQL node names and AG name are not resolving or couldn’t register in the network. please look at the IP and DNS configurations.
Suggestions:
-Please make sure no IP config or DNS issues for AG and SQL nodes on DR side.
-Please make sure all the SQL nodes are on same patch level and up to date.
-Please make sure the replication is healthy.
- Event ID 1069:
Description: This event is logged when a clustered resource fails to come online.
Common Causes:
Resource Configuration: Incorrect configuration of the resource (e.g., network name, IP address, disk, etc.).
Network Issues: DNS-related issues for network name resources or NIC-related events for IP address resources.
Disk Errors: Disk-related errors or warnings for physical disk resources.
Troubleshooting Steps:
Check the system event log for Event ID 1069.
Review other related errors or warnings.
Examine the cluster log for further details.
Use PowerShell to collect cluster logs: Get-ClusterLog -Destination C:\temp. - Event ID 1205:
Description: Indicates a deadlock situation in the cluster.
Common Causes:
Resource Dependencies: Resource dependencies causing conflicts.
Network Issues: Network communication problems.
Troubleshooting Steps:
Check the cluster log for deadlock-related entries.
Verify resource dependencies and network connectivity. - Event ID 1254:
Description: Indicates that a resource failed to come online due to a timeout.
Common Causes:
Resource Configuration: Incorrect configuration.
Resource Startup Time: Resource taking too long to start.
Troubleshooting Steps:
Review the cluster log for details.
Check resource configuration and startup time.
Fix:
Do a force failover on the SQL Server side. Run the following on SQL Server Management Studio…
ALTER AVAILABILITY GROUP xxxxxxx FORCE_FAILOVER_ALLOW_DATA_LOSS;
Comments