Tuesday, October 27, 2020

PowerCLI - Optimizing Scripts

Not sure when and how this drive to optimize appeared in me, but I've seen it taking control in different situations and driving other people crazy. So I thought why not give it a try on the blog also. Maybe something good comes out of it. 

I will go over a couple of simple concepts that will help with the execution of PowerCLI scripts. The first one is API calls and to be more specific the number of calls. 

Let's use the following example: 

  • we have a list of VMs and we need to get the total used space by those VMs

The use case can be approached in two ways:
  • get the size for each VM from the list and make the sum 
foreach ($vmName in $vmList) {
	$totalUsedSpace  += (Get-VM $vmName).UsedSpaceGB
}
  
In this case we would use Get-VM cmdlet for each VM in the list. When we use Get-VM we actually make an API call to vCenter Server. So that means for a list of 10 VMs we do 10 API calls. If we do this with 100 VMs that means 100 calls. The immediate effect is that the script takes a long time to execute and we increase the load on vCenter Server with each call. 

But we can get all VMs from vCenter Server in one call and then go through the VMs
  • get all VMs in vCenter Server and then check each VM in the list 
$allVms = Get-VM

foreach ($vm in $allVms) {
	foreach ($vmName in $vmList) {
		if ($vm.Name -eq $vmName) {
			$totalUsedSpace  += $vm.UsedSpaceGB
		}
	} 
	
}

The advantage of the previous example is we do only one API call. The disadvantage is we take all objects and then we have imbricated "for" loops. And this can take a long time especially when there are a lot of objects in vCenter Server. Let's take a look at some execution times.

The data set is made of more than 6000 VMs in vCenter Server ($allVms) and we are looking for the size of a list of 100 VMs ($vmList)

For 100 VMs, the first script (on API call for each VM in list) takes around 275 seconds. The second script takes 14 seconds. Even if we do "for" in "for", it takes way less time than to do hundreds of API calls. Obviously, if we increase the $vmList size to 300 VMs the first script will take almost 3 times the time while the second will only increase by a few seconds to 16 seconds. That comes from the increased complexity of running the imbricate loops. The more we increase $vmList, the more time it will take. At 600 VMs in $vmList, it takes 21 seconds to run. At this moment we'd like to see if there is another way to actually decrease the complexity of the script and make it faster. 

Let's see how to get rid of the imbricated loops. For this to work, we'll use a hash table (key-value pair) created from vm names and their used space. Once the hash table is created then we search in it only for the vms in $vmList:
$allVms = get-vm 
$hashVmSize = @{}
foreach ($vm in $allVms) {
	$hashVmSize.Add($vm.Name,$vm.UsedSpaceGB)
}
$totalUsedSpace = 0 
foreach ($vmName in $vmList) {
	$totalUsedSpace  += $hashVmSize[$vmName]
}

For the new script the time to execute for the 600 VMs in $vmList is less than 12 seconds (almost half the time of the imbricated loops script). The complexity reduction comes from the number of operations executed. If we count inside the loops how many times it is executed, we'll see that for the hash table script we run the "for" loops for almost 7000 times, while for the second script (imbricated loops), we run the loop for more than 38 million times (yes, millions). 

What we've seen so far:
  • calling the API is expensive - 1 call is better than 10 calls 
  • imbricated loops are not scalable (not even nice) 
  • hash tables can help
This doesn't mean that if you want to see the size of 2 VMs you should bring 6000 VMs from vCenter Server. It means that next time when your script takes 20 minutes, maybe there is a faster way to do it.