Friday, May 24, 2013

Self-referential Post.

Quick post while I write a longer post for netcraftsmen.net.

Last night I ran into a quirky error with HA. A few hosts weren't able to connect to the HA Master. A quick review of the /var/log/fdm.log turned up lots of the following:

[29904B90 verbose 'Cluster' opID=SWI-d0de06e1] [ClusterManagerImpl::IsBadIP] <ip of the ha master> is bad ip

But the hosts were able to vmkping the other cluster members' management IP address, including the HA Master. (I admit that I checked DNS first; old habits die hard). It was late, I was two pots of coffee into a change window, and I was looking for help. So I posted a question to VMTN: vSphere HA Agent Unreachable.

Well wouldn't you know it, Duncan Epping replied pretty quickly with some suggestions. He asked if I had changed SSL certificates recently, which I hadn't, and then included a link to some other steps to take. I ended up resolving the issue just by disabling HA on the cluster, then re-enabling it. Go figure.

Fast forward to this morning, and what do I see from DuncanYB? A link to a new post he wrote about vSphere HA Agents in an unreachable state.

Most interesting is the log snippet he included:

[29904B90 verbose 'Cluster' opID=SWI-d0de06e1] [ClusterManagerImpl::IsBadIP] <ip of the ha master> is bad ip

Look familiar?

It's great to know that the VMware community follows the sun, too.