VMware HA Admission Control and VM reservations

Did you know about VMware HA admission control coupled to VM reservations? To be honest I thought I knew, but recently I was pointed to some details that showed me I was wrong.

What I’m talking about is the cluster setting of “HA Admission Control” and how “Percentage of cluster resources reserved as failover spare capacity” and “Host failures the cluster tolerates” are related to the CPU and memory reservations at VM level. These settings will make sure that your HA cluster will reserve enough resources to recover from host failures, depending on how high you set the % of resources to be reserved, more host failures can be tolerated.

Where did I go wrong? Well I thought vCenter made the calculations for the HA spare capacity based on real usage, using 5min interval. But I was wrong. These calculations are not based on real life numbers but on the reservations you set at the VM level. Same goes for the “Host failures the cluster tolerates” setting, the slot size is based on the reservations being used per VM.

Auch, I felt a little embarrassed being wrong in this. Especially since I normally checked on reservations of VMs being set to zero if there was no special need for a reservation. But as I started asking around to people on what they used for their VM reservations, I learned that not many were using these VM reservations and more people then I expected also had the wrong idea about this.

So, to be clear once and for all:

The values used in calculations for “Host failures the cluster tolerates” and “Percentage of cluster resources reserved as failover spare capacity” are based on the CPU and memory reservations set at VM level.

And to proof that setting no reservations can overload your cluster, have a look at my lab environment where I have set NO reservations on any of the VMs. My three hosts have 8GB of RAM each and when you look at the current load, you can see that my current memory usage is 52%, 52% and 92% which makes a total of 15.7 GB of RAM in use of the 24GB I have in my lab. Which is 65%. Now the vSphere HA status box on the summary page of the cluster shows that I have a “Current Memory Failover Capacity” of 81%. Anyone can see that’s not right. If you’re asking why the 81% and not a full 100%, those 19% are lost on VM memory overhead. But it might be clear that I can’t power on another 81% of 24GB = 19GB of VMs and still have 25% spare HA failover capacity.

For details on how the calculations are made, check Duncan’s VMware HA DeepDive Guide, a must read.

Now I have two questions for you, please respond in the comments:

  • The “make me feel a little better” question: Did you know about this?
  • What is the default reservation on CPU and RAM you are using in your environment. Of course, special VMs will have different requirements, but as a ‘rule of thumb’ what is the % of reservation you set?