Thursday, March 28, 2013

VMware Distributed Power Management (DPM)

One of the features Enterprise license brings to VMware vSphere is Distributed Power Management (DPM). This feature saves power by dynamically adjusting cluster capacity according to existing workloads. DPM will power off and power on hosts in the cluster based on load average of each host. During power off, VMs are automatically consolidated on remaining hosts. 

For powering on ESXi hosts, DPM uses one of the following technologies: iLO, IPMI or WoL. WoL packets (Magic packets) are sent over vMotion interface by another host in the cluster. DPM puts hosts into so called "standby" by actually entering the host in ACPI S5 power state. In this state power consumption is minimal, no user mode or system code is run and the systems`s context is not preserved. Some components are still running to ensure starting of the host. This state is also called "soft off".

DPM evaluates CPU and memory utilization for each host in the cluster and tries to keep the host within a specific range. By default utilization range is 45% to 81%. The range is computed from two settings:
DemandCapacityRatioTarget = utilization target of the host - by default it is 63%
DemandCapacityRatioToleranceHost = variation around utilization target - by default it is 18%
Utilization range = DemandCapacityRatioTarget +/- DemandCapacityRatioToleranceHost
Default settings can be changed in DRS - Advanced settings and can be set between 40% and 90% for DemandCapacityRatioTarget and between 10% and 40% for DemandCapacityRatioToleranceHost.

DPM uses two time intervals for evaluating power on and power off recommendations. For power on the period is 300 seconds (5  minutes), while for power off the period is 2400 seconds (40 minutes). This means DPM considers more important responses to increased load. However, this also means that a sudden increase in the load will be be considered by DPM after 5 minutes and will be resolved after the host boots up, which may add another 5 to 10 minutes. The values can be changed by setting parameters VmDemandHistorySecsHostOn (default 300 seconds) and VmDemandHistorySecsHostOff (default 2400 seconds) to a value between 0 and 3600 seconds.

DPM will ensure that at least one host will be left running. The settings MinPoweredOnCpuCapacity (default 1 MHz) and MinPoweredOnMemCapacity (default 1 MB) are used to control how many hosts will be left running. Default values ensure minimum one host is up and can be changed. For example if the cluster has hosts configured with 24 GHz and 128 GB setting the parameters 24001 MHz and 131073 MB will reserve 2 hosts running at all time. Even with default values, when HA cluster is enabled, DPM will leave 2 hosts powered on to provide fail over resources in case of one host failing.

For more details about how DPM selects hosts for power off and cost/benefit analysis, I strongly recommend the book VMware vSphere Clustering Deepdive and the white paper VMware Distributed Power Management: Concepts and Usage

Configuring DPM

Before enabling DPM on a cluster it is best to test that hosts can be powered off and started. This can be done from vSphere client by putting a host in standby. Check first that the network adapter supports WoL:

Put the host in standby and confirm moving all powered off and suspended VMs to other hosts in the cluster:

After the host is powered off, bring it back from vSphere client:
After all hosts in the cluster have been tested, one last check can be done by going to Cluster - vSphere DRS - Host Options and view if hosts succeeded to exit standby:

Next, enable DPM and choose one of the 3 power management policies:

If DPM behavior needs to be controlled, a very good idea is to have it enabled during NBH and disabled during BH. Set power management to Automatic and add 2 scheduled tasks of type Change cluster power settings in vSphere - one to enable DPM in the evening (19.00) and another one to disable DPM in the morning ( 7.00).  This way during BH all hosts are running at full capacity and during night the data center takes full advantage of power savings. 

Wednesday, March 20, 2013

PowerCLI - Batch move templates

While preparing for configuring DPM on one of the clusters, I noticed that templates are scattered around all the hosts. A design best practice for DPM states to put all your templates on one host and disable DPM on that host - this way you will not get the surprise of not being able to access some of your templates because the host was powered off. Remembering that I decided to bring all templates in one place. First I took a look at PowerCLI cmdlets - get-template and move-template and hoped for an easy job. However the cmdlets are a bit tricky:
  • get-template - documentation states that it takes as input for Location "vSphere container objects (such as folders, data centers, and clusters) you want to search for templates". It is partialy true, since it does not accept clusters, but it accepts hosts;
  • move-template -  Destination parameter used to "specify a container object where you want to move the templates" accepts only Datacenters and Folders.
My use case is input a cluster name and consolidate all templates on a single host. The next best thing was to use the cmdlets above and make my little script. The script takes as input 2 parameters: cluster and destination host. It gets all templates from cluster, converts them to VMs, migrates the VMs to destination host and reconverts them to template.

Function MoveTemplate()
get-cluster $cluster | get-vmhost | foreach {
get-template -Location $_ | foreach {
Write-Host "moving $_"
get-vm (set-template -Template $_ -ToVM).Name | move-vm -Destination $dsthost | set-vm -ToTemplate -Confirm:$false 


Wednesday, March 13, 2013

vShield Edge configuration and operational issues

In this post I will cover some of the aspects of using vShield Edge (vSE) devices at vApp level, mostly things I have learnt the hard way. I am currently using vCloud 1.5 and the aspects herein are related to this version. However, I will specify changes that appear in vCD 5.1. 
In vApp diagram, vSE is represented interposed between the vApp network and the organization network:

The symbol displays a firewall, if firewall services are active. vSE offers the following services:

  • DHCP
  • firewall
  • routing 
  • NAT

All configuration is done using vCD portal. DHCP services can be created for vApp network and the configuration is basic:

It is not possible to create MAC reservations, but basic service functions are available.

Firewall configuration has by default Deny policy enabled and allows any traffic going outbound:

vSE can be configured to send syslog messages to a central server, and it is highly recommended to do it if you want to debug anything. But the syslog server can only be configured globally at vCD level, not separately for each organization. If you are an organization admin, it will be difficult, if not impossible to get access to syslog server.. Other issue is that the graphical interface does not allow copy or clone rules.Which means a lot of manual work. To make things worse, you cannot add two different hosts or subnets in the source or destination field - no possibility to group more objects. More manual work. I do not know why they have provided such a bad interface to vSE.

Next service is NAT - by default vSE is configured in IP translation mode. VMs are assigned IPs from Organization network and vSE does the rest: source and destination NAT. 

Org network IPs can be statically assigned or automatically. But, they cannot be unassigned from a VM. If there are 10 VMs connected to the vApp network, then all 10 will be assigned IPs from the Organization network even if only 1 VM needs such an IP (simple use case - one arm load balancing topology). Adding the 1 IP used by vSE, a total of 11 IPs will be used. This is a problem since you might get into situations where you run out of IPs at Organization level (I did).

The only solution is to change NAT mode of vSE to Port forwarding. In which case you are limited to the IP of vSE. All requests are sent to vSE`s IP and it vSE does basic port forwarding:

Do not think of publishing in the vApp two services on the same port. You will not be able to do it.

A difference in 5.1 is that you can specify source and destination NAT separately.

Last feature that vSE provides is static routing. There is not much to talk about it - it allows adding of static routes:

One more limitation of vSE in 1.5 is that it has only 2 interfaces: one external and one internal. In 5.1 there are multiple network interfaces provided to external networks.

vCloud Director 5.1 brings other changes from 1.5:
  • IP masquerading setting was removed
  • there is no default rule on the firewall

More details about these two changes can be found here

Unfortunately, vSE brings too many technical limitations to be a useful firewall and in many cases architectural decisions have to be made to not use vSE. Changing vSE with other technologies that provide better firewall, NAT and routing functions also brings its own technological challenges.

Wednesday, March 6, 2013

PowerCLI - Get VMware tools status

This little piece of code displays information about VMware tools status for one or more VMs. Output can be configured to display running status (running or not), status (outdated or updated) and version of tools. The script takes as input 2 parameters: one of the two -clname and -vmname, and the mandatory parameter -tools. The syntax is:
./ToolsStatus [-clname clname | -vmname vmname] -tools [stop | old | install | status | version | all]

Description of parameters:
  • clname - enter the name of cluster
  • vmname - enter the name of a VM
  • stop - displays VMs with tools not running
  • stop - displays VMs with outdated tools
  • install - displays VMs with tool not installed
  • status - display running status and update status for a VM
  • version - displays tools version for a VM or more
  • all - displays running status, update status and tools version for a VM or more
There are 3 ways the script can be used
  • if -clname parameter is chosen -tools can take the value stop, old, install, version or all
  • if -vmname parameter is chosen -tools can take the value status (prints both runstatus and update), version or all
  • if no value is entered for clname and vmname but -tools is all, the script displays tools information
    (runstatus, update and version) for all VMs in the datacenter.  
A few examples of usage:
Get all VMs from cluster01 where tools are not running:

./ToolsStatus -clname cluster01 -tools stop

Get tools version for all VMs in cluster02
./ToolsStatus -clname cluster02 -tools version

Get all tools information for VM dc01
./ToolsStatus -vmname dc01 -tools all

The script uses Get-View which returns vSphere .Net view objects. The main advantage of get-view is that it is a lot faster than using normal get-cluster, get-vm cmdlets. Just for the fun you can use Measure-Command to time the execution of get-view command and get-vm. 

The script:

    [Parameter(Position=0,Mandatory=$false)][string] $clname,
    [Parameter(Position=0,Mandatory=$false)][string] $vmname,
    [Parameter(Position=1,Mandatory=$true)][string] $tools
    if ($clname) {
        if ($tools -eq "all") {
            get-view -ViewType "VirtualMachine" -SearchRoot (get-cluster $clname).Id | Select Name,@{N="ToolsRunningStatus";E={$_.Guest.ToolsRunningStatus}},@{N="ToolsStatus";E={$_.Guest.ToolsVersionStatus2}},@{N="ToolsVersion";E={$_.Guest.ToolsVersion}}
        elseif ($tools -eq "stop") {
            get-view -ViewType "VirtualMachine" -SearchRoot (get-cluster $clname).Id -Filter @{"Guest.ToolsStatus" = "toolsNotRunning"} | Select Name,@{N="ToolsStatus";E={$_.Guest.ToolsStatus}}
        elseif ($tools -eq "old") {
            get-view -ViewType "VirtualMachine" -SearchRoot (get-cluster $clname).Id -Filter @{"Guest.ToolsVersionStatus2" = "guestToolsSupportedOld"} | Select Name,@{N="ToolsStatus";E={$_.Guest.ToolsVersionStatus2}}
        elseif ($tools -eq "install") {
            get-view -ViewType "VirtualMachine" -SearchRoot (get-cluster $clname).Id -Filter @{"Guest.ToolsVersionStatus2" = "guestToolsNotInstalled"} | Select Name,@{N="ToolsStatus";E={$_.Guest.ToolsVersionStatus2}}
        elseif ($tools -eq "version") {
            get-view -ViewType "VirtualMachine" -SearchRoot (get-cluster $clname).Id | Select Name,@{N="ToolsVersion";E={$_.Guest.ToolsVersion}}
        else {
            Write-Warning "Option not implemented $tools"
    elseif ($vmname) {
        if ($tools -eq "all") {
            get-view -ViewType "VirtualMachine" -filter @{'name'=$vmname}| Select Name,@{N="ToolsRunningStatus";E={$_.Guest.ToolsRunningStatus}},@{N="ToolsStatus";E={$_.Guest.ToolsStatus}},@{N="ToolsVersion";E={$_.Guest.ToolsVersion}}
        elseif ($tools -eq "status") {
            get-view -ViewType "VirtualMachine" -filter @{'name'=$vmname}| Select Name,@{N="ToolsRunningStatus";E={$_.Guest.ToolsRunningStatus}},@{N="ToolsStatus";E={$_.Guest.ToolsStatus}}
        elseif ($tools -eq "version") {
            get-view -ViewType "VirtualMachine" -filter @{'name'=$vmname}| Select Name,@{N="ToolsVersion";E={$_.Guest.ToolsVersion}}
        else {
            Write-Warning "Option not implemented $tools"
    elseif ($tools) {
        if ($tools -eq "all") {
            get-view -ViewType "VirtualMachine" | Select Name,@{N="ToolsRunningStatus";E={$_.Guest.ToolsRunningStatus}},@{N="ToolsStatus";E={$_.Guest.ToolsVersionStatus2}},@{N="ToolsVersion";E={$_.Guest.ToolsVersion}}
        else {
            Write-Warning "Option not implemented $tools"