Thursday, December 27, 2018

Back to Basics: Making nslookup more useful

As we forge ahead in to a brave new world or AI, ML, and AR, it's helpful to occasionally step back and consider some basic information technology skills that we should all possess. These are foundational skills that demonstrate functional understanding of IT principles. This post deals with one of the most basic tools in the administrator's kit: nslookup.

DNS is Everything

Thanks to DNS, we address our systems, sites, and services with human-reabable text. Without DNS, we'd be forced to recall the IP address of each system we want to connect with. Sure, you can probably memorize a few dozen /24s, but it's not practical to live without DNS. And it's always a reasonable suggestion to, when things on your network just went belly-up, check DNS. Because it's always DNS.

If ping is the first command junior IT admins learn, nslookup is a close second. And just like most IT admins are content to ping hostnames and IPs without ever looking into the richness of the command's syntax, nslookup's best tricks are reserved for those who want more from their query than a simple hostname or IP.

Before we get any farther, I'll note that for a short time nslookup was a deprecated utility. But the ISC reversed its course in 2004 and agreed to let nslookup soldier on. (Note change 1700 in the CHANGES log on the BIND 9.3 release page, which contains the all-business text that saved nslookup: nslookup is no longer to be treated as deprecated. Remove "deprecated" warning message. Add man page.).That's why you'll find it on every modern OS to this day (see this link for Microsoft's latest info on nslookup in Windows).

nslookup vs. dig

For starters, comparing these two utilities is like comparing an abacus to a TI-81: you wouldn't ever expect an abacus to produce a graph of the sine function. The same is true for nslookup: you wouldn't expect it to return a vast amount of information regarding a single host. dig is great at that.

But if you use Windows at work, and don't have access to dig, you can add a simple switch to your nslookup queries to make it return a wealth of dig-like responses for the most innocuous request.

The secret is to append -debug to your nslookup queries (if you're a one-at-at-time nslookup-er), or enter the nslookup utility with -debug for extended DNS query sessions. Instead of returning simple IP information for your hostname queries, nslookup will now return a whole host of information. That's a DNS joke. Yes, I'm sorry.

Making sense of this information will be covered in the next post in this series. In the meantime, nslookup -debug away!

Wednesday, June 20, 2018

vmkping Error: Unknown Interface

In the middle of troubleshooting an issue with vMotion traffic failing, I ran into an annoying issue with vmkping: attempting to specify certain vmkernel interfaces as the traffic source would throw an error like the one below.

What's annoying about this is vmk4 is not unknown. It's tagged for vMotion traffic.

After some googling, I learned that using a poorly-documented argument will allow vmkping to work properly. If you've run into this issue, add ++netstack=vmotion to your vmkping command. You'll get the results you were expecting the first time around.

Incidentally, if you've ever posted screenshots of your ESXi host's ssh session and blurred out the hostname for SECURITY purposes: don't do that. Instead, change the prompt by modifying /etc/profile.local. William Lam has a years-old post here (note that what he suggested years ago has been implemented as default config). Much cleaner presentation this way.

Wednesday, January 24, 2018

UCS Manager: Failed to Split Certificate Chain

So now that we're in the era of turn-everything-into-a-web-app management, you're spending time with the shiny new HTML5 UCS Manager application. We've come so far from the early days of UCS, 1.x and early 2.x releases that felt like .01alphas. If you suffered the indignities of the java-based version, I feel you. Those were dark times.

The HTML5 interface is a sight for sore eyes. And if you're using 3.1(3b) as you should be (I started this post a long time ago, apparently), you've got a stable, responsive environment in which to create and apply policy to your servers. I'd never call managing anything in IT "fun," but managing things in UCS Manager is at least not "not fun." High praise, I know.

But you hate that it's using a self-signed cert. You have a CA (or at least you have access to one) and you'd like to issue a trusted cert to make Chrome and Firefox and modern browsers of all sorts stfu about missing subjectAltNames. So you set about the process of requesting a new certificate, and then you try to import the cert into UCS Manager. You set up a Trusted Point, copy the certificate chain into the too-too-tiny window, and save. So far, so good. But when you paste in your cert and associate it with the Trusted Point, you get an error complaining about not being able to split the certificate chain. It looks like this.
No SSL for you!

Sometimes, this issue is easily solved by making sure that you've included the full certificate chain in your trusted point config. And since it's not obvious what you're supposed to do there, here's a tip: you have to copy and paste your certs into the same window.

But here's the rub: if you are certain you've correctly imported your cert chain and you're still getting errors about splitting the certificate chain, it's because you failed to fill in the Subject: field in your CSR. Trust me.

Historically, subjectAlternativeNames have been optional, or rather, optionally implemented. The notion of a subjectAlternativeName has been around for decades, but it wasn't until last year when browser developers started requiring a sAN to avoid the dreaded SECURITY WARNING message that we've all learned to subconsciously ignore. And by browser developers, I mean Google, makers of Chrome, the browser we fell in love with a decade ago and now hate as much as we hated IE4 when it killed Netscape Navigator.

But back to the point: you're getting this error because you didn't include a subjectAlternativeName in your certificate. So just go back and generate a new CSR from UCSM with the "Subject" field populated with the FQDN of your UCSM, and send that to your certificate authority. Then copy and paste the new cert, bask in the glory of a successful import, and browse to UCSM error-free, even from Chrome.

Monday, January 22, 2018

So about that #spectre patch...

One unintended consequence of the government shutdown is the drowning out of all non-shutdown-related news. Lost in all of the noise of brinksmanship and idiotic wall-building is some pretty fascinating tech news, with particular regard to everyone's favorite first order vulnerabilities: spectre and meltdown.

You'll recall that speculative execution, a feature of modern microprocessors, was recently identified as exploitable in such as way as to leak memory from a system. And to make matters worse, it's possible to leak memory between VMs and between VMs and their hosts. While currently demonstrated techniques require local admin access, it's certainly possible to use other attack vectors to get root, then attack the processor. Good times.

VMware was quick to respond to the threat by issuing several security advisories, but most importantly this one: VMSA-2018-0004. vSphere admins everywhere began the routine process of deploying patches to their hosts, updating vCenter, and making sure that all VMs were running hardware version 9 or later.

But over the weekend, VMware made a minor edit to this security advisory. And by minor, I mean a huge update that should make you put the brakes on remediation efforts. From the updated KB, here's the important bit:

Intel has notified VMware of recent sightings that may affect some of the initial microcode patches that provide the speculative execution control mechanism for a number of Intel Haswell and Broadwell processors. The issue can occur when the speculative execution control is actually used within a virtual machine by a patched OS.
You're probably wondering what the hell a "sighting" is after reading this. Short version: it's what Intel calls an issue with a processor that has been reported not just in their internal testing, but in a customer environment in the field. In other words, this is not a theoretical issue. It's an observed fact.

Of course, the VMware KB is lacking in details on what effect this issue has on running virtual machines. If VMware is taking the bold step of removing the speculative execution protection patches from the VUM download source, I'll assume the effect is bad. We're doing some testing to determine what exactly happens when a guest OS attempts to use the protections provided by these VMware patches. I'll update the post with the results of our testing.

To VMware's credit, they're reacting to these security events as quickly as possible, and they're being transparent about their progress.

So in the meantime, if you've already deployed the update to your hosts (and your hosts have CPUs listed in the KB, which appears to list most CPUs in use today), you'll want to follow the instructions in the KB to implement corrective action to each host. Just do yourself a favor: carefully read the bullets following the config change. The devil is in the details.

Sunday, January 21, 2018

Executive Speculation on the Speculative Execution Situation

Security issues that are resolved via the installation of a single patch are easy mode in a few regards: they're easy to fix, and they're easy to measure. How many times have you heard your CIO ask, "What percent complete are we for <insert cool vulnerability name here>?" That's because executives love metrics, and patch installations are easily quantifiable:

  • How many systems do we have?
  • How many systems are vulnerable?
  • How many systems are fully patched?
  • How many systems need to be patched?

Execs love 3D pie charts.
You can be sure that once the exec collects these data points, a shiny new pie chart will be willed into existence and cut and pasted into a PowerPoint presentation concerning incident response. Then you'll enter the measuring progress phase of remediation, in which each morning these four data points are updated and the pie chart is refreshed.

Remediations for #spectre and #meltdown, however, are not so primitive. For modern on-prem environments, you can count on applying complex, interdependent remediations to each layer of your stack, from the server hardware you rely on (in the form of microcode and/or firmware updates), to the hypervisor you trust (in the form of host and management server patches and updates), to the virtual machines that migrate throughout your data center (in the form of vm version upgrades (yeah, you're not the only one with VMs using version 4 in your production environment), to the guest operating systems (in the form of patches to the OS), to the anti-virus applications running within those guest operating systems (in the form of compatibility assurances inserted in the Windows Server registry). Once all of these mitigations are in place, then you've fully addressed the vulnerability (at least as of the end of January 2018).

Many of these steps require planned downtime. Some of these steps are dependent upon others; surely by now you've read that applying updates to Windows without having a compatible anti-virus solution has a nasty habit of breaking Windows in the form of the dreaded BSoD. A few intrepid admins inserted the required "QualityCompat" key to the registry of a server that lacked a validated av solution with mixed results.

Undeniably, implementing safeguards for spectre and meltdown are not easily captured in a 3D pie chart. Such a chart would be visually cluttered and would immediately lose its intended audience who wishes to see, in clear, clean, coordinated, contrasting colors, the state of remediation.

The result of the difficulty in measuring speculative execution remediation activities is this: no one measures speculative execution remediation activities, which translates to not a whole lot of attention being paid at the executive level. Sure, the technologists of the world are frantically updating and patching and running PowerShell scripts to validate the state of protection. But the flurry of activity is confined to the lowest layers of the org chart. Bikeshedding is alive and well in the enterprise.

Infrastructure, dear friends, is important. I suspect that as we've moved from client-server to virtualization to cloud, we've abstracted ourselves far away from the hardware that makes IT possible. Some vendors even proclaim that infrastructure should be invisible. And while I understand the intent of such a provocative statement, I believe it has been interpreted as "infrastructure should be ignored."

This is a risky ideology to employ in the data center, to be certain.
Mastodon