Sysadmin Stories: March 2020

Saturday, March 21, 2020

vROps Custom Dashboard for Monitoring vRealize Automation Reservations

It's been a while since I last tried to create a custom dashboard in vRealize Operations Manager (vROps). I think it was called vCenter Operations Manager at that time and the version was 5.8. Fear not, in today's post we are talking about vROps 7.5.

The use case is pretty simple: I need a way of monitoring the capacity of the reservations in terms of memory and storage. The management pack for vRA is tenant and business group focused, which doesn't really apply in my case where I have only one tenant and multiple business groups using the same reservation.

The way the dashboard is being organized as following:

Top level is an object list widget that gets automatically updated by any selection in the other dashboards. Main information is displayed in Top-N widgets that show top 15 most utilized reservations in terms of storage and memory. On the right side I've added 2 heatmap widgets for the same metrics - allocated % storage and memory per reservation. However the heatmaps present all reservations and their size is relative to the size of the reserved resource. The bigger the drawn size, the bigger the reserved value is. Any interaction with the Top-N or Heatmap widgets will provide more details in the Object List. The interactions view has been added to vROps somewhere in 2018 and it's a great and simple way to create interactions between widgets.

How the dashboard works: let's say we have a reservation that is 90% memory utilized displayed in Top-N widget. When selected, the Object List on top will get populated with the reservation details: which vSphere cluster is being mapped to the reservation, how much memory is actually allocated for that vSpher cluster in vRA and how much physical memory in total the cluster has. Kind of a drill down into the situation. Of course, being in vROps you can further drill down on the vSphere cluster.

In this case the selected reservation is at 81% memory usage. The top widget displays the real value - which is less than 400 GB. The heatmap on the right can be used to analyse the overall situation. Don't forget the bigger the reservation size is, the bigger the size in the heatmap. While in the Top-N list we are actually filtering the data and selecting only the ones that are critical.

Let's take a deeper look into how each widget type is configured:

Reservation Usage - Object List widget

Configuration is selected as Self Provider off since it receives data from other widgets. We add additional columns to display in the widget such as mapped cluster, free memory.

To add columns, press the green plus and filter by adapter type and object type

I've also removed the widgets default columns from the view since I am not interested in collection state, collection status.

VRA Reservation Memory Allocated % - Top-N widget

For this widget, select Self Provider On in configuration section. Aslo select Top-N Options as Metric analysis and Top Highest Utilisation. Enable auto refresh if you want to update the metric data.

Once self provider is set to on, Input Data section becomes active and here add all your reservations. Data will be analysed from all reservation and only first 15 will be displayed based on the criteria selected in Output Data section.

In Output Data, select the object type as reservation and then the metric to be processed as Memory Allocated %.

Lastly, we can add Output Filters. Let's say we don't want to see a top 15 of all the reservations' memory usage, but only the ones that are above a certain threshold like 75%. We also do not want to see in there reservations that are above the set threshold, but because they are very big they actually have sufficient resource, more than 1TB of RAM for example. In this case we would add a filter on the output data that limits the displayed info:

Memory Allocated % - Heatmap widget

For heatmap we use the same input data as for the Top-N: all reservation. What changes is the Output Data. We'll group the information by Reservation (but it can be Reservation Policy, or tenant or whatever grouping it suits)

Next we select Reservation as the object type. The metrics used for Size by and Color by are different since I wanted to have a representation of how big is the VRA reservation and also of its usage. The bigger the reserved memory size, the bigger the box will be drawn. The more used the reservation is, the darker the color will be.

Output filter can be used here also, for example if you are not interested in very small reservation or want to filter out some of them based on the naming (reservations for a test environment). Putting a little extra time to tweak the widgets to your requirements and environment will prove beneficial since the visualized data makes sense to different users based on their needs.

Sunday, March 15, 2020

Distributed vRealize Automation 7.x Orchestrated Shutdown, Snapshot and Startup using PowerCLI

I will take a look at performing scheduled operations on vRealize Automation 7.6 (although the article can apply to other versions). In a distributed architecture, vRA 7.6 can become a pretty big beast. Based on the requirements and the actual vSphere implementation (virtual datacenters and vCenter Servers), such a deployment can easily grow to 12-16 VMs. Scheduling operations that require restart of the environment requires careful preparation because of the dependencies between different components such as vRA server, IaaS components, MSSQL database. One of the most common and repetitive tasks is Windows patches requiring regular IaaS components reboots. But there are other activities that need to shutdown the whole environment and take a cold snapshot, for example a hot fix.

VMware documentation defines the proper way of restarting components in a vRA distribute environment. What I've done is to actually take those steps and put them in a PowerCLI script making the procedure reusable and predictable. A particular case is to detect if a VM is a placeholder VM (being a VM replica). Before going to the script itself, let's look at the whole workflow.

The first part is just a sequential shutdown and wait until the VMs poweroff to go to the next step. Then a cold snapshot is taken for all VMs. Lastly, VMs are powered on in an orchestrated sequence and wait times are implemented to allow for the services to come back up.

Getting to code part - first we define the list of VRA components, in this case proxies, DEM-workers, DEM-orchestrators, IaaS web, IaaS Managers and vRA Appliances.

# vCenter Servers
$vCSNames = ("vcssrv1", "vcssrv2", "vcssrv3","vcssrv4")

# vRA Components
$workers = @("vradem1", "vradem2","vraprx1","vraprx2", "vraprx3", "vraprx4", "vraprx5", "vraprx6")
$managerPrimary = @("vramgr1")
$managerSecondary = @("vramgr2")
$webPrimary = @("vraweb1")
$webSecondary = @("vraweb2")
$vraPrimary = @("vraapp1")
$vraSecondary = @("vraapp2")

# Snapshots
$snapName = "vra upgrade"
$snapDescription = "before 7.6 upgrade"

# Log file
$log = "coldSnapshotVra.log"

Next we define the 3 functions for shutdown, snapshot and start the VMs. Since in our environment we use SRM, I had to check for placeholder VMs when powering off and snapshotting the VMs. We'll take them one by one. First. shutdown VMs and wait for the VM to stop:

function shutdownVMandWait($vms,$log) {
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
            foreach ($o in $vm) {
                if($o.ExtensionData.Summary.Config.ManagedBy.Type -eq "placeholderVm") {
                    Write-Host "VM: '$($vmName)' is placeholderVm. Skipping."
                } else {
                    if (($o.PowerState) -eq "PoweredOn") {
                        $v = Shutdown-VMGuest -VM $o -Confirm:$false
                        Write-Host "Shutdown VM: '$($v.VM)' was issued"
                        Add-Content -Path $log -Value "$($v)"
                    } else {
                        Write-Host "VM '$($vmName)' is not powered on!"
                    }
                }   
            }
        } catch {
            Write-Host "VM '$($vmName)' not found!"
        }
    }
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
            while($vm.PowerState -eq 'PoweredOn') { 
                sleep 5
    Write-Host "VM '$($vmName)' is still on..."
                $vm = Get-VM -Name $vmName
            }
            Write-Host "VM '$($vmName)' is off!"
        } catch {
            Write-Host "VM '$($vmName)' not found!"
        }
    }
}

Next, take snapshots of the VMs

function snapshotVM($vms,$snapName,$snapDescription,$log) {
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
        } catch {
            Write-Host "VM '$($vmName)' not found!"
            Add-Content -Path $log -Value "VM '$($vmName)' not found!"
    
        }
        try {
            foreach ($o in $vm) {
                if($o.ExtensionData.Summary.Config.ManagedBy.Type -eq "placeholderVm") {
                    Write-Host "VM: '$($vmName)' is placeholderVm. Skipping."
                    Add-Content -Path $log -Value "VM: '$($vmName)' is placeholderVm. Skipping."
                } else {
                    New-Snapshot -VM $o -Name $snapName -Description $snapDescription -ErrorAction Stop
                }   
            }
        } catch {
            Write-Host "Could not snapshot '$($vmName)' !"
            Add-Content -Path $log -Value "Could not snapshot '$($vmName)' !"
    
        }
    }
}

And finally, power on the VMs:

function startupVM($vms,$log) {
    foreach ($vmName in $vms) {
        try {
            $vm = Get-VM -Name $vmName -ErrorAction Stop
            foreach ($o in $vm) {
                if($o.ExtensionData.Summary.Config.ManagedBy.Type -eq "placeholderVm") {
                    Write-Host "VM: '$($vmName)' is placeholderVm. Skipping."
                } else {
                    if (($o.PowerState) -eq "PoweredOff") {
                        Start-VM -VM $o -Confirm:$false -RunAsync
                    } else {
                        Write-Host "VM '$($vmName)' is not powered off!"
                    }
                }   
            }
        } catch {
            Write-Host "VM '$($vmName)' not found!"
        }
    } 
}

Last part of the script is the putting all the logic together. Connect to vCenter Server, orderly shutdown VMs, take the cold snapshots and bringing back the whole environment.

# MAIN
# Connect vCenter Server
$creds = Get-Credential
try {
    Connect-VIServer $vCSNames -Credential $creds
} catch {
    Write-Host $_.Exception.Message
}

# Stop VRA VMs
Write-Host "### Stopping DEM Workers an Proxies"
shutdownVMandWait -vms $workers -log $log
Write-Host "### Stopping Secondary Managers and Orchestrators"
shutdownVMandWait -vms $managerSecondary -log $log
Write-Host "### Stopping Primary Managers and Orchestrators"
shutdownVMandWait -vms $managerPrimary -log $log
Write-Host "### Stopping secondary Web"
shutdownVMandWait -vms $webSecondary -log $log
Write-Host "### Stopping primary Web"
shutdownVMandWait -vms $webPrimary -log $log
Write-Host "### Stopping secondary VRA"
shutdownVMandWait -vms $vraSecondary -log $log
Write-Host "### Stopping primary VRA"
shutdownVMandWait -vms $vraPrimary -log $log

# Snapshot VRA VMs
Write-Host "### Snapshotting DEM Workers an Proxies"
snapshotVM -vms $workers -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting Secondary Managers and Orchestrators"
snapshotVM -vms $managerSecondary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting Primary Managers and Orchestrators"
snapshotVM -vms $managerPrimary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting secondary Web"
snapshotVM -vms $webSecondary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting primary Web"
snapshotVM -vms $webPrimary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting secondary VRA"
snapshotVM -vms $vraSecondary -snapName $snapName -snapDescription $snapDescription -log $log
Write-Host "### Snapshotting primary VRA"
snapshotVM -vms $vraPrimary -snapName $snapName -snapDescription $snapDescription -log $log

# Start VRA VMs
Write-Host "### Starting primary VRA"
startupVM -vms $vraPrimary -log $log
Write-Host  " Sleeping 5 minutes until Licensing service is registered"
Start-Sleep -s 300

Write-Host "### Starting secondary VRA"
startupVM -vms $vraSecondary -log $log
Write-Host  " Sleeping 15 minutes until ALL services are registered"
Start-Sleep -s 900

Write-Host "### Starting Web"
startupVM -vms $webPrimary -log $log
startupVM -vms $webSecondary -log $log
Write-Host  " Sleeping 5 minutes until services are up"
Start-Sleep -s 300

Write-Host "### Starting Primary manager"
startupVM -vms $managerPrimary -log $log
Write-Host  " Sleeping 3 minutes until manager is up"
Start-Sleep -s 180

Write-Host "### Starting Secondary manager"
startupVM -vms $managerSecondary -log $log
Write-Host  " Sleeping 3 minutes until manager is up"
Start-Sleep -s 180

Write-Host "### Starting DEM Workers an Proxies"
startupVM -vms $workers -log $log

Write-Host "### All components have been started"

# Disconnect vCenter 
Disconnect-VIServer * -Confirm:$false

You will notice that the orchestration logic is actually implemented here. This means you can easily add/remove/modify the VMs that the script targets. Let's say you only want to snapshot some proxies for which you don't need to bring everything down. Or you want to add external vRealize Orchestrators appliances. All changes take place in the main part by simply commenting out some steps.

This script helped a lot the nightly operations we had to do across our whole environment and I hope it will do the same for you.