Sysadmin Stories: May 2020

Thursday, May 28, 2020

Static route on dual homed vSphere Replication appliance

Recently went through the process of upgrading and troubleshooting a vSphere Replication environment. What was particular about that environment is the vSphere Replication appliances had 2 network interfaces.

The first interface (eth0) has the default gateway, but it is not used for replication traffic. The second interface (eth1) is connected to the portgroup that also connects to ESXi replication vmkernel portgroup. So, replication traffic is supposed to go over eth1. Main site and DR site have networks from different subnets, but connectivity is possible over the replication network. Since hosts in protected site (main site) need to communicate to vSphere Replication server in DR site we need to force this communication to go over the replication network.

The solution is pretty simple, add a static routes on the appliances to reach the opposite site over the replication network as following:

route add -net 192.168.200.0/24 gw 192.168.100.1

The route is not persistent and it will be lost upon reboot. To make it persistent, we need to add it to a configuration file. vSphere Replication 8.1 and 8.2 are running on VMware Photon OS 2.0. Normally you add the static route in the configuration file for the network where you want to have it. In my case in /etc/systemd/network/10-eth1.network:

[Match]
Name=eth1
[Network]
Address=192.168.100.11/24
DHCP=no
Domains=mylab.com
[DHCP]
UseDNS=false

[ROUTE]
Destination=192.168.200.0/24
Gateway=192.168.100.1

However this did not work and the route was not picked up at reboot. Then I tried a different approach. I needed to be sure the route add command would be run every time the appliance restarts, so I added it as a service. I first created the service configuration file called staticroute.service ( a name of my choice). The file is created in /lib/systemd/system/ and contains the following:

[Unit]
Description=Add static route for eth1
After=local-fs.target network-online.target network.target
Wants=local-fs.target network-online.target network.target

[Service]
ExecStart=/usr/sbin/route add -net 192.168.200.0/24 gw 192.168.100.1
Type=oneshot

[Install]
WantedBy=multi-user.target

Finally I've created a symbolic link for the file:

cd /lib/systemd/system/multi-user.target.wants/
ln -s ../staticroute.service staticroute.service

Once you do that you can run ls -la to display the files and you will see your staticroute.service

This will ensure the static route is created at every reboot. Make sure to add the routes in both sites. To test the communication you only need to traceroute the ESXi host replication IP from the opposite site.

Monday, May 11, 2020

VMs not Powering On in Nested ESXi Running on vSphere 7.0 and Options for Nested Lab

After upgrading physical home lab to vSphere 7.0, I've tried to power on the VMs in my nested environment to prepare a demo for an upcoming VMUG meeting. However, I couldn't get any VM to start in the nested ESXi 7.0 running on top of a physical ESXi 7.0. What actually happened is the nested ESXi host crashed.

I found out the following article warning about this issue affecting an entire family of CPU's - Intel Skylake. My home lab runs Intel Coffe Lake CPUs on gen 8 Intel NUC's and it seems they are affected too. It does not affect older CPU's as it is the case with my Ivy Bridge i5. Bottom line, until a patch or fix comes into main stream vSphere 7.0, you won't be able to power on a VMs in a nested ESXi 7.0 running on top of an ESXi 7.0. The rest of functionality is there and working.

I had to do my demo using the physical vSphere 7 and later come back to the lab to find a workaround. I found out there are two options that actually work at the moment:

option 1 - physical ESXi 7.0 running nested ESXi 6.7
option 2 - physical ESXi 6.7 running nested ESXi 7.0

Keeping the physical ESXi on 7.0 and downgrading nested 6.7 may seem the simpler path unless your use case is to test the new features and products. You could do it with the physical hosts, but that means to run all your tests on the base ESXi's and it could lead to partial or full lab rebuild. This approach invalidates the idea of having a nested lab. So now you are left with option 2: temporarily downgrade physical ESXi to 6.7. My use case requires to power on nested VMs, so option 2 is my choice.

I keep the physical lab on a very simple configuration with the purpose of being able to easily rebuild (reconfigure) the hosts. Before going to downgrade, a few aspects need to be considered:

are any VMs upgraded to the latest virtual hardware (version 17) - those VMs will not work on vSphere 6.7
cleanup vCenter Server: remove hosts from clusters and from vCenter Server inventory. Reusing the same hardware will cause datastore conflict if a cleanup is not done.
how the actual downgrade will take place (pressing Shift+R at boot start will not find any older install even it was an upgrade from 6.7)
hostnames and IP addresses

Having all this in mind, I embarked on the journey of fresh ESXi 6.7 installs that will allow to run nested ESXi 7.0.

Friday, May 8, 2020

vSphere Distributed Resource Scheduling - DRS

DRS is a core technology for resource management in a vSphere cluster. It has been around since ESX3 and it's a battle proven feature without vSphere clusters would not look the same. But what it actually does?
At a high level, it enables to use the resources of ESXi hosts in a cluster as an aggregated pool of resources. Drilling a bit into what it does we'll see that:

it provides virtual machine admission control - are there enough resources in the cluster to power on a VM
it provides initial placement of a VM - what is the most appropriate host to power on the VM
it is responsible for resource pools - quantifiable and aggregated resources to be consumed by a VM or group of VMs
it is responsible for resource allocation to VMs or resource pools using shares, reservation and limits
it balances the load in the cluster

vSphere 7 comes with an important change in the logic DRS uses. Until vSphere 7, DRS would try to balance the load looking at the cluster. If a host was overloaded at some point in time, it would try to balance it by migrating VMs to less utilized hosts. Checking cycle was 5 minutes. Starting with vSphere 7 the focus has shifted to VM. DRS calculates a per VM score called virtual machine happiness. Looking at the VM and running every minute, provides a better way of load balancing and ensuring placement of VMs.

Let's look at some of the features in DRS as they appear in the UI. As you can see above at the cluster level you can see the score of the cluster (an average of the scores of each VM) as well the score buckets for VMs. All my VMs are happy in the 80-100%, meaning they have all the resources they require. Going to VM view, we'll see the individual VM scores as well as some of the monitored metrics such as CPU % ready, swapped or ballooned memory:

DRS is enabled at cluster level. Once enabled, four tabs get activated.

Automation tab
The first choice is how much freedom you give to DRS: Automation Level.

There are 3 levels you can choose from

manual - generates recommendations for initial placement and migrations. But you have to actually apply the recommendations. Hence it is manual intervention every time.Very good when you need to do some troubleshooting.
partially automated - initial placement of VMs is done by DRS, but migrations are kept at recommendation level.
fully automated - DRS will take care of both initial placement and migrations

Once you have decided which automation level to use, you will choose the threshold for which migrations should be made. The slider is scaled from conservative to aggressive. DRS looks at an imbalance in the cluster the five levels on the slider determine how big that imbalance can be. A conservative setting will not generate migration recommendation for load balancing. An aggressive setting will calculate a very small imbalance threshold. This translates from almost no migrations (except for specific cases like putting a host into maintenance mode) to a lot of migrations.

Predictive DRS has been introduced with vSphere 6.5 and it utilizes metrics from vRealize Operations Manager to balance predicted cluster load and workload spikes.

Virtual Machine Automation enables VM level override of DRS and HA settings. When enabled, you can specify at Cluster - Configure - VM overrides the VMs for which you would change the default settings like having them excluded from migration recommendations:

Additional Options tab

VM Distribution instructs DRS to try and evenly distribute the VMs on hosts. It is a soft limit that will not be enforced over migration recommendations.

CPU Over-Commitment enforces the defined ratio of vCPU/core. When enabled, DRS will not allow to power on VMs if the ratio is overpasses. This enables to keep some clusters in the realm of performance. The max value is 32, this being the maximum vCPU/core for vSphere 7.

Scalable shares is a new feature introduced in vSphere 7. You can find very good articles here and here. In a nutshell, scalable shares takes care that the shares allocated to a VM are actually taking into consideration the share priority (high, normal, low) and avoids situations where VMs in resource pools with lower priority can get more resources than VMs in resource pools with higher priority. This situation is called resource pool priority-pie paradox.

Power Management tab

When activated, Distributed Power Management (DPM) looks at the cluster utilization and consolidates VMs on hosts based in order to power off hosts and save energy. For more details, you may look at this article

Advanced options tab

The tab displays advanced options that have been set for DRS through the UI or manually.

This has been a small introduction to DRS as it looks now in vSphere 7. There are a lot of features and details that have been barely touched or not touched at all. For a deep dive, I recommend the famous Clustering Deep Dive book although I am waiting for an updated version.

Friday, May 1, 2020

vSphere 7 Local Disk Fresh Install and VMFS-L

I just got my new Intel NUC and it was time to install it. So I popped the vSphere 7 USB stick and started installing. Ten minutes later I was looking at the freshly installed system and noticed that the hard drive was much smaller than expected - 337 GB from a 500 GB raw drive. The vSphere 6.7 NUC with the same drive had a capacity of 458 GB. So what happened to 120 GB of space?

Looking at the partition layout the new 120GB VMFSL caught my attention.

Because I am an engineer and I read the manual after the fact, I started reading vSphere 7 storage requirements in the official documentation. The VMFS-L is used as ESX-OSData partition instead of the scratch partition. It stores logs, coredumps and configuration. However I cannot loose 120 GB on each of my NUCs and I am already running vSphere 7 from a USB stick . So two questions came up:

1. How to recover some of the 120GB from VMFS-L

2. How to install vSphere 7

1. How to recover some of the 120GB from VMFS-L

I used the install USB stick to install vSphere 7 on it. However this didn't work since vSphere 7 would find the big VMFS-L partition and use it. Which made removing it not possible.

I also turned off scratch

Then I actually booted up a vSphere 6.7 USB installation. Now I could access the disk and remove VMFS-L partition.

2. How to install vSphere 7

Since I already had vSphere 7 running off a 16 GB USB stick, I figured out that it should be able to run with less capacity (again, it is my home lab, I wouldn't to all these tricks in production). Hence I installed vSphere 6.7 on the local disk and that got me to the following layout:

Then I added the host to vCSA 7.0 and upgraded to vSphere 7 using Lifecycle Manager which got me to a better looking final partition layout. The upgrade uses the existing core dump, locker, and scratch partitions to create the ESX-OSData volume.

Seems that it's better to read first even if you are playing in your lab. In my defense, I had installed vSphere 7 before but it was a nested ESXi and they had a small dedicated boot drive.

For more details on how to change scratch partitions you can also look at the following KB