Last night I ran into a quirky error with HA. A few hosts weren't able to connect to the HA Master. A quick review of the /var/log/fdm.log turned up lots of the following:
But the hosts were able to vmkping the other cluster members' management IP address, including the HA Master. (I admit that I checked DNS first; old habits die hard). It was late, I was two pots of coffee into a change window, and I was looking for help. So I posted a question to VMTN: vSphere HA Agent Unreachable.
Well wouldn't you know it, Duncan Epping replied pretty quickly with some suggestions. He asked if I had changed SSL certificates recently, which I hadn't, and then included a link to some other steps to take. I ended up resolving the issue just by disabling HA on the cluster, then re-enabling it. Go figure.
Fast forward to this morning, and what do I see from DuncanYB? A link to a new post he wrote about vSphere HA Agents in an unreachable state.
Most interesting is the log snippet he included:
[29904B90 verbose 'Cluster' opID=SWI-d0de06e1] [ClusterManagerImpl::IsBadIP] <ip of the ha master> is bad ip
It's great to know that the VMware community follows the sun, too.