Saturday, July 27, 2013

Fix Your Org Chart, Fix Your Infrastructure

See this? Don't do this.
Spend any measurable amount of time thinking about how to solve infrastructure problems, and you'll most likely end up thinking about organizational problems. This is doubly true when you're rooted in virtualization, and you don't really consider yourself a server, network, or storage engineer.

You've probably seen many org charts like the one to the right. The boxes for each team seem innocent enough: group staff by discipline and responsibility. Job descriptions can be pretty generic for each group. Status reports flow up the chain nicely. All of that great management structure from the 1970s.

It just doesn't work today.

Virtualization adds a layer of abstraction to org charts, too. You might not be able to distinguish between a server engineer and a storage engineer anymore. Network engineers are getting smarter about virtual servers. All of a sudden, everyone is speaking in the same technical jargon, and they're even understanding one another!
Almost named it Virtualization Team. Close one.

That is, of course, unless you still insist on sticking with the server, network, and storage team approach. And maybe we're overlooking the commonality in the naming convention: Team. In the sports world, when you have more than one team playing in the same league, they're competitors. I argue it's the same for IT organizations.

Organizational architects of the world: when you put engineers in nice little boxes like the three in the diagram above, you're designing for conflict. It's like building a vSphere cluster and disabling vMotion. Don't force your engineers to stay within the bounds of their "team." Let them move around in the organization freely, based on their resource requirements (without violating availability constraints, of course).

Fixing technology is easy; fixing organizations is assuredly not. But technology can't solve organizational problems, and frankly shouldn't.

Tuesday, July 23, 2013

More to Storage than x86 Virtualization

If you're following Nutanix and their evangelists on Twitter lately, you've no doubt seen lots of tweets using the #NoSAN hashtag. If you're not familiar with their products, it's worth checking out. They combine four server blades in a single chassis, and sweeten the deal with internal shared storage (both SSD and HDD). If you're looking for a great platform for your vSphere cluster and don't want to invest in a SAN, Nutanix deserves your consideration.

In my opinion, however, Nutanix has gone a little overboard with their claim that SAN is dead, and that traditional SAN deployments have no future. It's easy to get caught up in the x86 virtualization space and think of storage (SAN or NAS) as just a resource to be consumed by vSphere. But data centers aren't just filled with vSphere hosts, no matter what VMware would have you believe. I see AIX hosts in a surprising number of environments. I see lots of physical Windows hosts (usually with unique hardware requirements (e.g., PCI cards for fax servers). I've even seen clustered OS X Servers on Apple hardware (NSF, I'm looking at you). All of these servers need access to fast and reliable storage, and that usually means SAN.

Look. I'm a VMware geek like the rest of you. I love running Converter on physical servers and sharing the benefits of virtualization with clients and coworkers (and random Internet people like you!). But as I noted previously, VMware is not the world. Unless you've realized the Software-Defined Data Center, your SAN is more than just a place to stick your datastores.

Sunday, July 21, 2013

Overallocating vCPUs

I'm always looking for a way to explain why overallocating vCPUs to a VM is a bad idea. At best, it doesn't help. I've seen some discussion this morning on Twitter about this, so I'm sharing how I explain this to people.

Let's say you're going to dinner alone. When you talk to the hostess, you tell her you'd like a table for four because you're really hungry. You can eat faster at a table for four, right? Like, four times as fast? Of course not.

It's the same with vCPUs. If your workload is based on a single-threaded application, and you give it four vCPUs, that workload is dining alone at a table for four.

The metaphor can be extended to explain why doing this over and over again has ripple effects on the host (or the restaurant in this case). But I'll leave that up to you to think about. After all, you're the hostess.

Monday, July 15, 2013

Monday Morning Funny: VMware Fusion Locked... What?

Now that Fedora 19 is out, I thought I'd load it up on a Fusion VM and test it out a bit. The alpha releases I checked out a few months ago didn't run on VMware products, so I was anxious to see how the final image worked. I'm happy to say that Fedora 19 loaded without any problems, so you'll see a post soon about VMTools on this release (and a discussion on open-vm-tools while I'm at it).

After I loaded the OS and booted the VM, I went to install VMTools. But I hadn't removed the .iso from the install yet. Fusion gave me the following error message:

I had to read this one a few times before I started laughing. I'm using a MacBook Air (which has no built-in optical drive), and the media is an ISO file. Sure, I get the intent of the message. But the thought of a non-existent door being locked made me laugh.

Virtualization changes not only the technology we use, but the language we use to talk about the technology. How often do you say, "eject the disc" when you're referring to an ISO file that you mounted to a VM?

Thursday, July 11, 2013

SolarWinds Certified Professional

Since I've been working with SolarWinds software lately, I thought I'd take a crack at earning the SolarWinds Certified Professional certification. Happy to say that I passed with flying colors last night!

The test itself was a good measure of how much you know about network monitoring in general, with a healthy dose of SolarWinds Orion NPM administration thrown in for good measure. I've been using SolarWinds software since my days as a network administrator at Avectra (good grief, that was like 13 years ago!). I introduced SolarWinds Orion NPM and NTA to the National Science Foundation years ago, and I hope it's still being used to monitor their growing infrastructure. Now I'm using it for a new project I'm working on.

And on that note, I'm off to see what my Top 10s are for today.

Wednesday, July 10, 2013

Power Saving Modes in vSphere and Cisco UCS

If you've ever had a slow Friday and spent time poking around in vCenter Server or UCS Manager, you've probably come across some promising eco-friendly features like Distributed Power Management (DPM) and N+1 PSU redundancy. If you haven't, here's a summary of these technologies.

VMware's DPM - DPM is a feature available to vSphere clusters that determines if the cluster's workload can be satisfied using a subset of cluster members. If so, the VMs are vMotioned to free up one or more hosts which are then powered down into stand-by mode. Your cluster's HA settings are taken into account, so using DPM won't violate your availability constraints. Should the cluster's workload suddenly increase, vCenter will wake-up the stand-by hosts, then redistribute the workload across the additional hosts. Cool Stuff indeed. You save on power and cooling costs for each server that DPM puts into stand-by mode.

Cisco's UCS N+1 PSU Redundancy - N+1 is sometimes a tricky thing to wrap your head around, since its meaning changes depending on context. In the case of UCS, N+1 means the number of PSUs required to provide non-redundant power to your chassis, plus one additional PSU. So with a 5108 chassis, with all four PSU slots populated, N+1 would mean 3 PSUs active and one in "power save" mode. If one of the active PSUs fails, you still have redundancy, and the fourth PSU will be brought online to restore N+1 redundancy.

So that's the good news. Here's the bad news: DPM basically confirms that you overbought on hardware. And N+1 PSU redundancy may not give you the redundancy you're looking for. Here's why.

If you find that DPM is shutting down servers in your cluster more often than not, you purchased more hardware than you needed. This indicates that you didn't properly assess your workloads prior to creating your logical and physical designs. And that indicates that maybe you didn't account for other design factors. And that is not cool. An erstwhile pessimist, I suspect this is why many vSphere clusters do not have DPM enabled.

On the topic of Cisco UCS, N+1 PSU redundancy, and a false sense of security: chances are that what you really want to use here is Grid Redundancy, not N+1 redundancy. Grid means that you have power from two PDUs running to your 5108, and you want to spread your PSUs across those two PDUs. So you connect PSU1 and 3 to PDUA, and PSU 2 and 4 to PDUB. All four PSUs are online, and should a PDU fail, you still have two PSUs running. With N+1 and PSUs spread across two PDUs, you could encounter a situation where only one PSU is active while the "power save" PSU is turned up. One PSU may not be able to provide sufficient power to your chassis and blades, which can be... you guessed it: not cool.

Looking back on this post, I'm not sure why I lumped these two together, other than that they both deal with power. DPM and PSU configuration options solve different problems. There's no shame in including these features in your designs. Just make certain that you understand the benefits and pitfalls of each.

ps - It's late, and I'm listening to the Beastie Boys, and I'm low on Yuengling. Were I so inclined, I could add a footnote for nearly each claim above. But the point here is that you need to understand what these options do for you, and that means understanding other design requirements like total power consumption of your b-series blades.

Tuesday, July 2, 2013

New Post at netcraftsmen.net - The End of FibreChannel?

I'm in a writing mood lately, and thought I'd share some observations on virtualization, storage, and certain storage protocols that think they're too good to share cabling with other protocols. Read "The End of FibreChannel?" and let me know what you think!

Monday, July 1, 2013

vBeers - Washington, D.C. on Thursday, July 25, 2013

Turns out that setting up a vBeers really is that easy! Join us at The Dubliner on Thursday, July 25 at 5:00pm. We'll be talking virtualization, at least initially. After a few pints, topics will most likely include unicorns, bacon, office politics, VMworld, and IT war stories.

Click here for the vBeers official post.

See you there!