|
Monitoring the pieces is NOT the same as monitoring the whole. |
When it comes to monitoring, most people have a good idea of what infrastructure components need to be monitored. In a vSphere environment, it's commonplace to monitor the health and performance of the virtual machines, the ESXi hosts, the upstream network switches, and the storage platform. You might use different tools for each component (even though I think that's a
bad idea in general), but you're effectively monitoring each part of your vSphere infrastructure.
But there's one major gap in this approach: you're missing end-to-end monitoring.
I've been thinking about this situation lately. It's a result of a problem from earlier in the week. Some VMs were reporting very high disk latency (spiking between 100 and 200 ms). And as usual, the storage engineers said that the SAN was fine, the virt guys said that ESXi was fine, and the Windows guys said that the VM's OS was fine. So in the midst of every piece being "fine," we had a VM in trouble.
The Forest or the Trees?
In this scenario, each component of the virtualization infrastructure was being monitored. But the infrastructure as a collection of components was unmonitored. The connections between the components were invisible to monitoring. Production workloads were failing, and the root cause couldn't be identified. Sure, each tree seemed healthy, but the forest was on fire.
A Case for vCOps
I've liked
vCOps for e2e monitoring for a while now. It's intended to solve the very problem described above (in addition to lots of other functionality that I won't go into here). It enables you to quit micro-monitoring your infrastructure, and start monitoring your workloads. This is a subtle but significant shift from traditional monitoring, and in my experience, it's still a new concept for most IT shops.
What's the Point?
The point is this: regardless of the solution you use, end-to-end monitoring of your virtualization infrastructure is no longer a want; it's a need. End-to-end monitoring can eliminate (or at least reduce) finger-pointing when performance problems creep into your systems, and can alert you to capacity problems before they affect operations. Deploying vCOps is so easy to do, you'll wonder why you hadn't done it a year ago. So go do it today.