Sunday, November 9, 2014

Unperceived Existence of Collected Data

Can something exist without being perceived?



No, this blog hasn't taken a turn into the maddening world of metaphysics. Not yet.

I'm talking about event and performance logging, naturally. In the infrastructure racket profession (well, I think it's a vocation, but I'll get to that in a later post), we're conditioned to set up logging for all of our systems. It's usually for the following reasons:
  1. You were told to do it.
  2. SECURITY!
  3. Security told you to do it.
So you dutifully, begrudgingly configure your remote log hosts, or you deploy your logging agents, or you do some other manner of configuration to enable logging to satisfy a requirement. And then you go about your job. Easy.

But what about that data? Where does it go? And what happens to it once it's there?

Do you use any tools to exploit that data? Or does it just consume blocks on a spinning piece of rust in your data center? Have I asked enough rhetorical questions yet?

*   *   *

The pragmatic engineer seeks to acquire knowledge of all of her or his systems, and in times of service degradation or outage, such knowledge can reduce downtime. But knowledge of a system typically requires an understanding of "normal" performance. And that understanding can only come from the analysis of collected events and performance data.

If you send your performance data, for example, to a logging system that is incapable of presenting and analyzing that data, then what's the point of logging in the first place? If you can't put that data to work, and exploit the data to make informed decisions about your infrastructure, what's the point? Why collect data if you have no intent (or capacity) to use it?

Dashboarding for Fun and Profit (but mostly for Fun)

One great way to make your data meaningful is to present it in the only way that those management-types know: dashboards. It's okay if you just rolled your eyes. The word "dashboard" was murdered by marketing in the last 10 years. And what a shame. Because we all stare at a dashboard while we're driving to and from work, and we likely don't realize how powerful it is to have all of the information we need to make decisions about how we drive right in front of us. The same should be true for your dashboards at work.

So here are a few tips for you, dear readers, to guide you in the creation of meaningful dashboards:

  1. Present the data you need, not the data you want. It's easy to start throwing every metric you have available at your dashboard. And most tools will allow you to do so. You certainly won't get an error that says, "dude, lay off the metrics." But just because you can display certain metrics, doesn't mean you should. For example, CPU and memory % utilization are dashboard stalwarts. Use them whenever you need a quick sense of health for a device. But do you really need to display your disk queue length for every system on the main dashboard? No.
  2. Less is more. Be selective not only in the types of data you present, but also in the quantity of data you present. Avoid filling every pixel with a gauge or bar chart; these aren't Victorian works, and horror vacui does not apply here. When you develop a dashboard, you're crossing into the realm of information architecture and design. Build your spaces carefully.
  3. Know your audience. You'll recall that I called out the "management-types" when talking about the intended audience for your dashboards. That was intentional. Hard-nosed engineers are often content with function over form; personally, I'll take a shell with grep, sed, and awk and I can make /var/log beg for mercy. But The Suits want form over function. So make the data work for them.
  4. Think services, not servers. When you spend your 8 hours a days managing hosts and devices, you tend to think about the infrastructure as a collection of servers, switches, storage, and software. But dashboards should focus on the services that these devices, when cooperating, provide. Again, The Suits don't care if srvw2k8r2xcmlbx01 is running at 100% CPU; they care that email for the Director's office just went down.
Don't ignore the dashboard functionality of your monitoring solution just because you're tired of hearing your account rep say "dashboard" so many times that the word loses all meaning. When used properly, and with a little bit of work on your part, a dashboard can put all of that event and performance data to work.
Mastodon