Saturday, March 28, 2020

vCenter Server Appliance 7.0 Command Line Installer

One of my favorite features in vSphere is command line install of vCenter Server. It first appeared with vCenter Server 6.0. It is based on a JSON file input and can be use to do a fresh install of vCSA 7.0, upgrade an existing vCSA 6.5 or 6.7 installation to 7.0 or migrate a Windows vCenter Server 6.5 or 6.7 to vCSA 7.0.

The installer can be run from Windows, Linux or Mac. To access it, you need the vCSA iso file and locate folder vcsa-cli-installer\win32 (Windows users). JSON templates are found in templates folder. You need to modify the JSON template that fits your use case. I will do a fresh install of vCSA 7.0 in my lab so I will be using the template embedded_vCSA_on_VC.json which deploys the new vCSA inside an existing vCenter Server. The template is commented very well, however I will post here an example of what a simple configuration looks like. Please be aware that this is just a snippet of the actual template and some parts have been left out for ease of reading.


    "new_vcsa": {
        "vc": {
            "hostname": "vcsa67.mylab.com",
            "username": "administrator@mylab.com",
            "password": "",
            "deployment_network": "VM Network",
            "datacenter": [
                "VDC-1"
            ],
            "datastore": "DATASTORE-1",
            "target": [
                "CLUSTER-1"
            ]
        },
        "appliance": {
            "thin_disk_mode": true,
            "deployment_option": "small",
            "name": "vcsa70"
        },
        "network": {
            "ip_family": "ipv4",
            "mode": "static",
            "system_name": "vcsa70.mylab.com",
            "ip": "192.168.100.1",
            "prefix": "24",
            "gateway": "192.168.100.254",
            "dns_servers": [
                "192.168.1.10"
            ]
        },
        "os": {
            "password": "",
            "ntp_servers": "0.ro.pool.ntp.org",
            "ssh_enable": true
        },
        "sso": {
            "password": "",
            "domain_name": "vsphere.local"
        }
    }

As you can see, once you create the template it can reused a lot of times. What for you may ask and one answer is nested labs. If you are unsure what size the vCSA should be, the installer will tell you:
.\vcsa-deploy.exe --supported-deployment-sizes

The installer takes different parameters besides the JSON file:
.\vcsa-deploy.exe install --accept-eula [--verify-template-only|--precheck-only][file_path_to_json] 

If you want to automatically accept SSL certificate thumbprint, you can add --no-ssl-certificate-verification parameter.

As seen above, the installer comes with 2 options that enable you to check that everything is fine before actually starting the install:
  • verify-template-only - will run a JSON file verification to validate the structure and input parameters (e.g. password strength, IP address, netmask). The final check result is displayed along with the path to the log file. The log file contains all required details. For example if you typed an IP address that does not exist, the following message is displayed in log file:
2020-03-27 19:44:06,232 - vCSACliInstallLogger - ERROR - The value '192.268.100.1' of the key 'ip' in section 'new_vcsa', subsection 'network' is invalid. Correct the value and rerun the script.

  • precheck-only - will do a dry run of the installer. This time it will connect to vCenter server and check that the environment values are actually correct: for example that you don't have another VM with the same name, vCenter objects are correct (datacenter, datastore, cluster or host). It also does a ping test to validate the IP/FQDN entered for the new vCSA are available.
================ [FAILED] Task: PrecheckTask: Running prechecks. execution
Error message: ApplianceName: A virtual machine with the name 'vcsa70' already
exists on the target ESXi host or cluster. Choose a different name for the
vCenter Server Appliance (case-insensitive).

Of course, you don't have to run both checks or even any check if you are confident enough. For me, precheck-only helped since I didn't understand how to fill in the JSON file from the first time (I will blame it on a barrier language). One very important aspect of installing is to have DNS records setup and working. If you don't, even if the prechecks and the actual install will work, first boot of vCSA will most likely fail.

Having all setup up and checked, you  just run the install command and that's it. I like the CLI installer because it is simple, powerful and repeatable. No more filling in fields in a GUI and waiting for the lines on the screen.


Saturday, March 21, 2020

vROps Custom Dashboard for Monitoring vRealize Automation Reservations

It's been a while since I last tried to create a custom dashboard in vRealize Operations Manager (vROps). I think it was called vCenter Operations Manager at that time and the version was 5.8. Fear not, in today's post we are talking about vROps 7.5.

The use case is pretty simple: I need a way of monitoring the capacity of the reservations in terms of memory and storage. The management pack for vRA is tenant and business group focused, which doesn't really apply in my case where I have only one tenant and multiple business groups using the same reservation.

The way the dashboard is being organized as following:


Top level is an object list widget that gets automatically updated by any selection in the other dashboards. Main information is displayed in Top-N widgets that show top 15 most utilized reservations in terms of storage and memory. On the right side I've added 2 heatmap widgets for the same metrics - allocated % storage and memory per reservation. However the heatmaps present all reservations and their size is relative to the size of the reserved resource. The bigger the drawn size, the bigger the reserved value is. Any interaction with the Top-N or Heatmap widgets will provide more details in the Object List. The interactions view has been added to vROps somewhere in 2018 and it's a great and simple way  to create interactions between widgets.

How the dashboard works: let's say we have a reservation that is 90% memory utilized displayed in Top-N widget. When selected, the Object List on top will get populated with the reservation details: which vSphere cluster is being mapped to the reservation, how much memory is actually allocated for that vSpher cluster in vRA and how much physical memory in total the cluster has. Kind of a drill down into the situation. Of course, being in vROps you can further drill down on the vSphere cluster.


In this case the selected reservation is at 81% memory usage. The top widget displays the real value - which is less than 400 GB. The heatmap on the right can be used to analyse the overall situation. Don't forget the bigger the reservation size is, the bigger the size in the heatmap. While in the Top-N list we are actually filtering the data and selecting only the ones that are critical.

Let's take a deeper look into how each widget type is configured:


  • Reservation Usage - Object List widget
Configuration is selected as Self Provider off since it receives data from other widgets. We add additional columns to display in the widget such as mapped cluster, free memory. 

To add columns, press the green plus and filter by adapter type and object type

I've also removed the widgets default columns from the view since I am not interested in collection state, collection status. 





  • VRA Reservation Memory Allocated % - Top-N widget
For this widget, select Self Provider On in configuration section. Aslo select Top-N Options as Metric analysis and Top Highest Utilisation. Enable auto refresh if you want to update the metric data. 

Once self provider is set to on, Input Data section becomes active and here add all your reservations. Data will be analysed from all reservation and only first 15 will be displayed based on the criteria selected in Output Data section. 


In Output Data, select the object type as reservation and then the metric to be processed as Memory Allocated %.



Lastly, we can add Output Filters. Let's say we don't want to see a top 15 of all the reservations' memory usage, but only the ones that are above a certain threshold like 75%. We also do not want to see in there reservations that are above the set threshold, but because they are very big they actually have sufficient resource, more than 1TB of RAM for example. In this case we would add a filter on the output data that limits the displayed info:


  • Memory Allocated % - Heatmap widget
For heatmap we use the same input data as for the Top-N: all reservation. What changes is the Output Data. We'll group the information by Reservation (but it can be Reservation Policy, or tenant or whatever grouping it suits)


Next we select Reservation as the object type. The metrics used for Size by and Color by are different since I wanted to have a representation of how big is the VRA reservation  and also of its usage. The bigger the reserved memory size, the bigger the box will be drawn. The more used the reservation is, the darker the color will be. 

Output filter can be used here also, for example if you are not interested in very small reservation or want to filter out some of them based on the naming (reservations for a test environment). Putting a little extra time to tweak the widgets to your requirements and environment will prove beneficial since the visualized data makes sense to different users based on their needs.  


Sunday, March 15, 2020

Distributed vRealize Automation 7.x Orchestrated Shutdown, Snapshot and Startup using PowerCLI

I will take a look at performing scheduled operations on vRealize Automation 7.6 (although  the article can apply to other versions). In a distributed architecture, vRA 7.6 can become a pretty big beast. Based on the requirements and the actual vSphere implementation (virtual datacenters and vCenter Servers), such a deployment can easily grow to 12-16 VMs. Scheduling operations that require restart of the environment requires careful preparation because of the dependencies between different components such as vRA server, IaaS components, MSSQL database. One of the most common and repetitive tasks is Windows patches requiring regular IaaS components reboots. But there are other activities that need to shutdown the whole environment and take a cold snapshot, for example a hot fix.

VMware documentation defines the proper way of restarting components in a vRA distribute environment. What I've done is to actually take those steps and put them in a PowerCLI script making the procedure reusable and predictable. A particular case is to detect if a VM is a placeholder VM (being a VM replica). Before going to the script itself, let's look at the whole workflow.



The first part is just a sequential shutdown and wait until the VMs poweroff to go to the next step. Then a cold snapshot is taken for all VMs. Lastly, VMs are powered on in an orchestrated sequence and wait times are implemented to allow for the services to come back up.

Getting to code part - first we define the list of VRA components, in this case proxies, DEM-workers, DEM-orchestrators, IaaS web, IaaS Managers and vRA Appliances.

# vCenter Servers
$vCSNames = ("vcssrv1", "vcssrv2", "vcssrv3","vcssrv4")

# vRA Components
$workers = @("vradem1", "vradem2","vraprx1","vraprx2", "vraprx3", "vraprx4", "vraprx5", "vraprx6")
$managerPrimary = @("vramgr1")
$managerSecondary = @("vramgr2")
$webPrimary = @("vraweb1")
$webSecondary = @("vraweb2")
$vraPrimary = @("vraapp1")
$vraSecondary = @("vraapp2")

# Snapshots
$snapName = "vra upgrade"
$snapDescription = "before 7.6 upgrade"

# Log file
$log = "coldSnapshotVra.log"

Next we define the 3 functions for shutdown, snapshot and start the VMs. Since in our environment we use SRM, I had to check for placeholder VMs when powering off and snapshotting the VMs. We'll take them one by one. First. shutdown VMs and wait for the VM to stop:


function shutdownVMandWait($vms,$log) {
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
            foreach ($o in $vm) {
                if($o.ExtensionData.Summary.Config.ManagedBy.Type -eq "placeholderVm") {
                    Write-Host "VM: '$($vmName)' is placeholderVm. Skipping."
                } else {
                    if (($o.PowerState) -eq "PoweredOn") {
                        $v = Shutdown-VMGuest -VM $o -Confirm:$false
                        Write-Host "Shutdown VM: '$($v.VM)' was issued"
                        Add-Content -Path $log -Value "$($v)"
                    } else {
                        Write-Host "VM '$($vmName)' is not powered on!"
                    }
                }   
            }
        } catch {
            Write-Host "VM '$($vmName)' not found!"
        }
    }
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
            while($vm.PowerState -eq 'PoweredOn') { 
                sleep 5
    Write-Host "VM '$($vmName)' is still on..."
                $vm = Get-VM -Name $vmName
            }
            Write-Host "VM '$($vmName)' is off!"
        } catch {
            Write-Host "VM '$($vmName)' not found!"
        }
    }
}

Next, take snapshots of the VMs


function snapshotVM($vms,$snapName,$snapDescription,$log) {
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
        } catch {
            Write-Host "VM '$($vmName)' not found!"
            Add-Content -Path $log -Value "VM '$($vmName)' not found!"
    
        }
        try {
            foreach ($o in $vm) {
                if($o.ExtensionData.Summary.Config.ManagedBy.Type -eq "placeholderVm") {
                    Write-Host "VM: '$($vmName)' is placeholderVm. Skipping."
                    Add-Content -Path $log -Value "VM: '$($vmName)' is placeholderVm. Skipping."
                } else {
                    New-Snapshot -VM $o -Name $snapName -Description $snapDescription -ErrorAction Stop
                }   
            }
        } catch {
            Write-Host "Could not snapshot '$($vmName)' !"
            Add-Content -Path $log -Value "Could not snapshot '$($vmName)' !"
    
        }
    }
}

And finally, power on the VMs:


function startupVM($vms,$log) {
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
            foreach ($o in $vm) {
                if($o.ExtensionData.Summary.Config.ManagedBy.Type -eq "placeholderVm") {
                    Write-Host "VM: '$($vmName)' is placeholderVm. Skipping."
                } else {
                    if (($o.PowerState) -eq "PoweredOff") {
                        Start-VM -VM $o -Confirm:$false -RunAsync
                    } else {
                        Write-Host "VM '$($vmName)' is not powered off!"
                    }
                }   
            }
        } catch {
            Write-Host "VM '$($vmName)' not found!"
        }
    } 
}

Last part of the script is the putting all the logic together. Connect to vCenter Server, orderly shutdown VMs, take the cold snapshots and bringing back the whole environment.


# MAIN
# Connect vCenter Server
$creds = Get-Credential
try {
    Connect-VIServer $vCSNames -Credential $creds
} catch {
    Write-Host $_.Exception.Message
}

# Stop VRA VMs
Write-Host "### Stopping DEM Workers an Proxies"
shutdownVMandWait -vms $workers -log $log
Write-Host "### Stopping Secondary Managers and Orchestrators"
shutdownVMandWait -vms $managerSecondary -log $log
Write-Host "### Stopping Primary Managers and Orchestrators"
shutdownVMandWait -vms $managerPrimary -log $log
Write-Host "### Stopping secondary Web"
shutdownVMandWait -vms $webSecondary -log $log
Write-Host "### Stopping primary Web"
shutdownVMandWait -vms $webPrimary -log $log
Write-Host "### Stopping secondary VRA"
shutdownVMandWait -vms $vraSecondary -log $log
Write-Host "### Stopping primary VRA"
shutdownVMandWait -vms $vraPrimary -log $log

# Snapshot VRA VMs
Write-Host "### Snapshotting DEM Workers an Proxies"
snapshotVM -vms $workers -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting Secondary Managers and Orchestrators"
snapshotVM -vms $managerSecondary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting Primary Managers and Orchestrators"
snapshotVM -vms $managerPrimary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting secondary Web"
snapshotVM -vms $webSecondary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting primary Web"
snapshotVM -vms $webPrimary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting secondary VRA"
snapshotVM -vms $vraSecondary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting primary VRA"
snapshotVM -vms $vraPrimary -snapName $snapName -snapDescription $snapDescription -log $log

# Start VRA VMs
Write-Host "### Starting primary VRA"
startupVM -vms $vraPrimary -log $log
Write-Host  " Sleeping 5 minutes until Licensing service is registered"
Start-Sleep -s 300

Write-Host "### Starting secondary VRA"
startupVM -vms $vraSecondary -log $log
Write-Host  " Sleeping 15 minutes until ALL services are registered"
Start-Sleep -s 900

Write-Host "### Starting Web"
startupVM -vms $webPrimary -log $log
startupVM -vms $webSecondary -log $log
Write-Host  " Sleeping 5 minutes until services are up"
Start-Sleep -s 300

Write-Host "### Starting Primary manager"
startupVM -vms $managerPrimary -log $log
Write-Host  " Sleeping 3 minutes until manager is up"
Start-Sleep -s 180

Write-Host "### Starting Secondary manager"
startupVM -vms $managerSecondary -log $log
Write-Host  " Sleeping 3 minutes until manager is up"
Start-Sleep -s 180

Write-Host "### Starting DEM Workers an Proxies"
startupVM -vms $workers -log $log

Write-Host "### All components have been started"

# Disconnect vCenter 
Disconnect-VIServer * -Confirm:$false

You will notice that the orchestration logic is actually implemented here. This means you can easily add/remove/modify the VMs that the script targets. Let's say you only want to snapshot some proxies for which you don't need to bring everything down. Or you want to add external vRealize Orchestrators appliances. All changes take place in the main part by simply commenting out some steps.

This script helped a lot the nightly operations we had to do across our whole environment and I hope it will do the same for you.

Saturday, February 29, 2020

VMUG Leader Summit 2020 - An Eastern European Point of View

I had the opportunity to be invited to VMUG Leader Summit 2020. It takes place at VMware headquarters in Palo Alto and it brings together people from all around the world that share common passions: technology and community.

VMware User Group (VMUG) leaders are defined by their willingness to put an extra effort into bringing together local communities to learn new things and to discuss technology with good and bad. The local meetings are places of new and open ideas, networking and even socializing. I have been volunteering for enabling these meetings for the past seven years, but until I started to meet leaders from other countries I haven't understood the full potential. First I met fellow leaders at VMworld. But the time allocated at that event was limited. However even a few minutes with someone can prove inspiring. But when the summit was organized and the chance to spend 2 days next to my peers, then a new world opened to my eyes.

I come from an Eastern European country having somewhere deep down the, let's say, small town complex as I've always been looking up to Western Europe/US modern cultures. IT came as an escape for me and I have been working in multi cultural/multi national environments for the past 12 years. I've been in contact with people from Eastern and Western Europe, Middle East, Asia, Australia and even US in both business and social circumstances. However I have never seen all of these cultures coming together in the same time at a single table. Seeing all those people from literally all around the globe talking, debating, having fun in the same place is what never ceases to amaze me and what the summit brings as a huge value.

This was my second summit. Last year I was thinking that I was so lucky to get a glimpse at the organizational culture of another company and to see so many inspirational people coming and talking to us. For this year I had no expectations since I've already seen the best. And it was the opposite. I was again surprised and got more inspired and awed by the people that make up an organization.

As you can easily understand the summit wins on two categories. It brings together people from corners of the world to meet each other face to face and exchange ideas, inspire and trust one another. I believe that no matter how easy is to have virtual meetings, a handshake or a look in the eye can make a difference. The second win from the summit: it brings people closer to values and culture of VMware by giving us, the community of users, direct access to its C-level executives and awesome technical staff. Now that is a hard thing to do and such a cool one, too.

Personally, as a VMUG leader and as a human being, I fell like I've grown a bit since this journey started and so much more in the past 2 years. I leave here a dear photo and challenge you to guess the number of countries and continents represented in the picture:





Thursday, February 6, 2020

Migrate vRealize Orchestrator 7.3 Cluster with External Database to 7.6

In every environment's life comes a time when you need to upgrade. So it did happen for our vRO setup. The source environment is running vRO 7.3.1 in a 2-node cluster and with and external MSSQL database. The target environment is a 2-node cluster running vRO 7.6 and internal PostgreSQL (since no external is supported).

Because source vRO is connected to external MSSQL server, in place upgrade is out of the question. So a migration is actually the only way to do it. A constraint for the migration is not to change hostnames or IP addresses. Well, the requirement is actually to get everything back to 7.6 with a minimum of reconfiguration and here we talk about plugins, workflows, actions, configuration elements, certificates and all that jazz.

Having laid down the task ahead we started working on the plan. And what we came up with was pretty interesting. 

vRO 7.6 appliance offers a migration tool that is accessible through VAMI. The tool needs access to source vRO and database. Since we have a 2-node cluster, we'll take advantage of that and do a rolling migration. Source vRO 7.3 cluster is made of nodes: node1 and node2. Target vRO 7.6 cluster will be running the same vRO nodes.

So how does this rolling migration look like:




In initial state there is a vRO 7.3.1 cluster with external MSSQL database. We start by redirecting traffic in load balancer to node1 and disabling monitoring on the pool. Then we break the cluster by removing node2 and powering it off. Next we deploy a new node2 running 7.6 and using the same hostname and IP address. If you want to reuse the same VM name, you just need to rename the old node in vCenter Server.

This takes us to intermediary state where we have one node1 in 7.3.1 and node2 in 7.6 with and embedded PostgreSQL DB. Once node2 in 7.6 is up an running, we connect to VAMI and fire up the migration tool. Migration tool will transfer all data from node1 and the external DB to node2. After vRO is migrated to 7.6, we proceed to power off node1 running 7.3.1 and deploy node1 in 7.6. Don't forget to switch over traffic to node2 in load balancer.

Now we are in final state where both nodes are running 7.6. Once node1 is up, we add it to the cluster that node2 is already member. The cluster is back and this time running vRO 7.6 with embedded PostgreSQL database. Simple, right?

Well, there are a few of gothcas:
- when running the migration tool, a pre-check is run that can fail if the vRO DB contains duplicates; it can be fixed by either removing duplicates from DB or connecting with vRO client to source vRO and renaming them
- SSL certificates are not migrated - you need to connect to source vRO keystore and export the certificates, then re-add them in control center
- there is a cumulative update that needs to be applied to the freshly deployed vRO 7.6
- dynamic types plugin configuration is not migrated - there is an updated plugin that needs to be installed
- in some cases you may need to update the appliance hardware including file system

Lastly enable traffic in load balancers to both nodes and re-enable health monitors. Do that after importing SSL certificate.

Happy migrations!


Thursday, December 26, 2019

Check ESXi MTU settings with PowerCLI

Sometimes the simplest tasks can be time consuming in large environments. This time is about MTU settings on the ESXi host,

First let's see a bit about MTU (Maximum Transmission Unit). It is a setting that defines the largest protocol data unit that can be sent across a network (largest packet or frame). Default setting is 1500 bytes. Having a bigger MTU increases the performance for particular uses cases as large amounts of data transmission over Ethernet. So it's always been set to larger values (9000 bytes) for accessing iSCSI and NFS storage. For a vSphere environment it means it could (and should in some cases) be increased for almost all types of traffic: vMotion, vSAN, Provisioning, FT, iSCSI, NFS, VXLAN, FCoE.

Let's take the use case of accessing a NFS datastore, as seen in the picture below:

The biggest challenge with MTU is to have the environment properly configured end-to-end.This means when you want your ESXi to make use of large MTU for accessing a NFS datastore, you'll need to make sure that distributed virtual switches, physical network interfaces, vmkernel portgroups, physical switches at system level and per port level and filers are configured with the proper MTU. What happens in our example when some elements are configured with the default MTU (1500)? In the case the vmkernel portgroup is set at 1500, then no you will see no performance benefits at all. If one of the physical switches is configured with 1500 bytes, then you will get fragmentation of the packets (performance degradation).

Hoping this short theoretical intro to MTU was helpful, I will jump ahead to the topic: checking ESXi MTU with PowerCLI. We are not treating how to check physical switches and storage devices within present article.

At ESXi level we need to check 3 settings: distributed virtual switch, physical network interfaces (vmnic used for uplinks) and vmkernel portgroups. To accomplish this we make use of two different PowerCLI cmdlets: Get-EsxCli and Get-VMHostNetworkAdapter

The beauty of Get-EsxCli is that it exposes esxcli commands and you can access the host through vCenter Server connection (no root or direct login to the ESXi host is required). The not so nice part its you have to use esxcli syntax in PowerCLI as you will soon see.

Main checks

We will first look at the script checks. Please keep in mind $h is a variable initialized with Get-VMHost cmdlet.
  • distributed virtual switch - will get the switch name, configured MTU and used uplinks; the loop ensures all dvswitches are checked
(Get-EsxCli -VMHost $h -V2).network.vswitch.dvs.vmware.list.Invoke() | foreach {
    Write-Host "DVSName  $($_.Name) MTU  $($_.MTU) UPLINKS  $($_.Uplinks)"
}

  • vmnics - check configured MTU, admin and link status for each interface (there is no issue in having unused nics configured differently) 
(Get-EsxCli -VMHost $h -V2).network.nic.list.Invoke() |  foreach {
    Write-Host "NIC:"$_.Name "MTU:"$_.MTU "Admin:"$_.AdminStatus "Link:"$_.LinkStatus
}


  • vmkernel portgroups - check configure MTU and IP address
$vmks = $h | Get-VMHostNetworkAdapter | Where { $_.GetType().Name -eq "HostVMKernelVirtualNicImpl" }
foreach ($v in $vmks) {
    Write-Host "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
    }


The script

Putting it all together, we'll add the three checks in a foreach loop. The loop iterates through all the clusters and within each cluster through all the hosts. The script creates one log file per cluster containing all the hosts in that cluster and their details:


foreach ($cls in Get-Cluster) {
    $fileName = $cls.Name + ".log"
    Write-Host "# CLUSTER $($cls)" -ForegroundColor Yellow
    foreach ($h in $cls |   Get-VMHost) {
        Write-Host "$($h)" -ForegroundColor Yellow
        Add-Content -Path $fileName -Value "$($h)"

        (Get-EsxCli -VMHost $h -V2).network.vswitch.dvs.vmware.list.Invoke() | foreach {
            Write-Host "DVSName  $($_.Name) MTU  $($_.MTU) UPLINKS  $($_.Uplinks)"
            Add-Content -Path $fileName -Value "DVSName $($_.Name) MTU $($_.MTU) UPLINKS $($_.Uplinks)"
        }
        (Get-EsxCli -VMHost $h -V2).network.nic.list.Invoke() |  foreach {
            Write-Host "NIC:"$_.Name "MTU:"$_.MTU "Admin:"$_.AdminStatus "Link:"$_.LinkStatus
            Add-Content -Path $fileName -Value "NIC: $($_.Name) MTU: $($_.MTU) Admin: $($_.AdminStatus) Link: $($_.LinkStatus)"
        }
        $vmks = $h | Get-VMHostNetworkAdapter | Where { $_.GetType().Name -eq "HostVMKernelVirtualNicImpl" }
        foreach ($v in $vmks) {
            Write-Host "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
            Add-Content -Path $fileName -Value "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
         }
    }
}

Opening one of the log files, you see similar output to below:
esx-01.rio.lab
DVSName dvs-Data1 MTU 9000 UPLINKS vmnic3 vmnic2 vmnic1 vmnic0
NIC: vmnic0 MTU: 9000 Admin: Up Link: Up
NIC: vmnic1 MTU: 9000 Admin: Up Link: Up
NIC: vmnic2 MTU: 9000 Admin: Up Link: Up
NIC: vmnic3 MTU: 9000 Admin: Up Link: Up
VMK vmk0 MTU 9000 IP 192.168.10.11
VMK vmk1 MTU 9000 IP 192.168.20.11
VMK vmk2 MTU 9000 IP 192.168.30.11

In this case everything looks good at ESXi level. Easy part is over, so start digging in the physical switches CLI and any other equipment along the path to ensure end to end MTU consistency.

Saturday, November 30, 2019

PBKAC and Nested ESXi VSAN

This post is a short one about a PBKAC (Problem Between Keyboard and Chair) and how I discovered it while setting up a nested VSAN cluster in my home lab.

I've recently decided to install a nested VSAN environment. I've created 4 nested ESXi hosts running 6.7 U3, added disks to them (flash disks) and tried to configure VSAN. And it was not smooth at all.

First, I've noticed I couldn't add flash devices as capacity tier. Disks were seen only when I manually marked them as HDD in vCenter Server. Even after doing this, the "NEXT" button was grayed out and couldn't complete the cluster creation.

Being a stubborn engineer, I did not read the manual, but I've dropped the UI and went all the way esxcli. Created the VSAN cluster in command line with no luck this time also. The cluster ended up in a split cluster scenario while UI showed confusing health messages.

Decided there is something wrong with my 6.7 U3 image (?!?) and Sunday coming to an end, I've temporarily postponed the project. Restarted it a few days later to find out that I was using vCenter Server 6.7 build 13639324 (U2a).So yeah, vCenter Server 6.7 U2a and VSAN 6.7 U3a doesn't sound like a good match. Immediately, I've updated vCenter Server to U3a and VSAN cluster creation took 5 minutes.

It wasn't a waste of time, as I've dusted off some powershell and I've learnt a valuable lesson: always check the version of your components. But it was definitely a PBKAC.