Thursday, December 1, 2022

What I've Learned From Using Instant Clones in vSphere

Instant clone is a technology to create a powered on VM using as source another running VM. An instant clone VM shares memory and disk state with its source VM. Once it is powered on, the instant clone is a fully manageable independent vCenter Server object.  The clones can be customized and have unique MACs, UUID. This makes the technology very appealing for use cases where large number of VMs need to be created in a short time from a controlled point in time - think about VDIs. 

My use case was on-demand labs generated from the same lab template(s). A lab template is made of 3 to 6 VMs of different sizes running interdependent applications. Users login to a web app and then request one or more new labs from the available templates. The web app would start in the background lab provisioning for all the requests via vCenter Server. 

Using full clones would have meant a higher load on the systems and also a longer time to wait for a lab to be ready - boot time of the all the VMs in the cloned lab plus time for services to start in guest OS of each VM. Additionally there was no information on how many labs would be requested at a time. There were also multiple source lab templates having a worse case scenarios of tens to hundreds of VMs being requested within a minute. I chose instant clones as the way forward.  

When using instant clone there are 2 provisioning workflows: running source VM and frozen source VM, as seen in the picture below taken from Understanding Clones in vSphere 7 performance study published by VMware.

In running source VM, a temporary stun is initiated to allow for checkpoint the VM and create the delta disks. Then the source is back to its running state. Each new instant clone will depend on the the shared delta disk potentially hitting the vSphere limit of 255. These delta disks are redo logs and are not tied to snapshot chain, hence not visible in UI. The limit for supported snapshot chain in vSphere is still 32. In case the limit is hit, cloning will fail as described in KB article 67186. To avoid this limitation, you could use frozen source VM provisioning workflow in which the source is frozen and no longer running and the delta disks are only created for child VMs. 

Since the lab templates were actually running different services that did not cope very well with being frozen for longer periods of time, I used running source VM workflow. To create the clones I borrowed and adapted the code from William Lam found here instant clone PowerCLI module (thank you!). He also has some very good articles on the technology. 

What I did not realize at the time is that it will impact the performance of the labs once the number of delta disks increased. The cloned labs were temporary by nature and removed after a specific run time. However the delta disks on the source VMs were not cleaned up and just kept increasing which in the end impacted user experience. So I needed to introduce a cleaning mechanisms. 

The simplest way to clean up source VM was by using an idea that I got from Veeam Snapshot Hunter and to create a snapshot for the lab template VMs (source VMs) and then immediately initiate a delete all command. This will clean up all the delta disks from the source VMs. The PowerCLI script would run nightly as a scheduled job.  

$labPrefix = "lab-1-*"
$vms = Get-VM -Name $labPrefix
foreach ($vm in $vms) {
    $snapTime = get-date -Format "MM/dd/yyyy HH:mm"
    $description = $vm.Name + " " + $snapTime
    New-Snapshot -VM $vm -Name "delta disk cleanup" -Description $description -Memory:$true -Confirm:$false
    Get-Snapshot -VM $vm -Name "delta disk cleanup" |  Remove-Snapshot -RemoveChildren -Confirm:$false    

The plan is to test Vim.VirtualMachine.PromoteDisk(unlink=True) method in the future.

A few take away points:
- instant clone is a very fast cloning technology and it also optimizes resource usage (memory, disk)
- if the number of cloned VMs from the same source is very large ( > 200) use frozen source VM workflow
- when using running source VM, make sure to include a cleanup mechanism of the delta disks
- time synchronization in the source VMs is very important (as always)
- if you need full performance, use full clones 

Tuesday, October 19, 2021

Certifications during pandemics - Pearson VUE online proctoring

Recently I had the opportunity to take (and pass) 3 certifications and I did it using Pearson's OnVUE online proctoring. Talking to other colleagues of mine I found out there are mixed feelings about the experience. For me it was an overall good experience. So, I've decided to put together a few thoughts about how it went. 

The good 

You can schedule the exam anytime you want and you can do it from one day to another. You are home in your office, so it's a familiar space. There is no commute to the test center and back. For me these are the biggest advantages. 

The not so good

You have to clean up your desk and disconnect everything. If you have docking station, multiple monitors and other equipment it will be a bit of work to do. If you have other things around your desk (like my old film cameras  that I keep as decorations), you will need to move those too. Be prepared to use your webcam to show that cables are unplugged. 

Another thing to take care: no one is allowed to enter the room be it kid, partner or pet. This may prove an inconvenient.

The app delivering the exam is not optimized for wide monitors. That makes the questions very long and places the buttons in strange positions. But you get used to it, or better use laptop screen. 

The weird

The proctor experience can vary a bit. It was fine for 2 exams to use external monitor, not fine during another one. The weirdest thing: I was told during one exam not to look up because that is not allowed and doing it again will fail my exam (!?!). Small issue here: when I try to remember things I involuntarily look up. Luckily I managed to pass the exam without remembering too many things. 

Connectivity issues

It happened one morning to take longer to connect and get someone to enter online with me. It took me more than half hour to start the process. But after that all went well. No biggie here, just start on time.

Once you get the exam started, the experience is the same like in any test center. I am not sure that I would like to go back to a test center unless absolutely necessary (like looking up during the exam). 



Wednesday, October 6, 2021

What's new in vRealize Automation 8.5.x and 8.6

The latest releases of vRealize Automation bring in a series of interesting features. 

Cloud Resource 

Cloud Resource view was introduced back in May 2021 for vRA Cloud and allows to manage resources directly instead of managing them by resource groups (deployments). It allows now to manage all discovered, onboarded and provisioned deployments, trigger power day 2 actions on discovered resources and bulk manage multiple resources at the same time.

ABX enabled deployment for custom resources

Provisioning a custom resource allows you to track and manage the custom resource and its properties during its whole lifecycle. No dynamic types are needed for full lifecycle management. 

Cloud Templates Dynamic Input 

Use vRO Actions for dynamic external values to define different types of input values directly at the Cloud Template and bind local input to the dynamic inputs as action parameters. 

Kubernetes support in Code Stream Workspace

The Code Stream pipeline workspace now supports Docker and Kubernetes for continuous integration tasks. The Kubernetes platform manages the entire lifecycle of the container, similar to Docker. In the pipeline workspace, you can choose Docker (the default selection) or Kubernetes. In the workspace, you select the appropriate endpoint. 

The Kubernetes workspace provides:

  • the builder image to use
  • image registry
  • namespace
  • node port
  • persistent Volume Claim
  • working directory
  • environment variables
  • CPU limit
  • memory limit.

You can also choose to create a clone of the Git repository.


vRA leverages Azure provisioning capabilities, including the ability to enable/disable boot diagnostics for Azure VMs for Day 0/2, and the ability to configure the name for the Azure NIC interfaces.

Other updates and new features 
  • Native SaltStack Configuration Automation Config via modules for vSphere, VMC, and NSX
  • Leverage third-party integrations with Puppet Enterprise support for machines without a Public IP address
  • Deploy a VCD adapter for vRA
  • Onboard vSphere networks to support an additional resource type in the onboarding workflow

Thursday, August 26, 2021

VMworld 2021 - Sessions to watch

This year is going to be the second one in a row when I don't get to do my favorite autumn activity: go to VMworld in Barcelona. But I do get to do part of it - attend the virtual VMworld 2021. And to make it as close as possible to the real experience, I will most probably add some red Spanish wine and jamon on the side. 

As for the sessions I am looking forward to attend, I will leave here a few of my choices:

VMware vSAN – Dynamic Volumes for Traditional and Modern Applications [MCL1084]

I've been involved recently in projects with Tanzu and vSAN and this session with Duncan Epping and Cormac Hogan is the place to go to see how vSAN continues to evolve, to learn about new features, integration with with Tanzu and hear some of the best practices.  

The Future of VM Provisioning – Enabling VM Lifecycle Through Kubernetes [APP1564]

A session about what I think is a one of the game changers introduced by VMware this year: include VM-based workloads in modern applications using Kubernetes APIs to deploy, configure and manage them. I've been using working with VM service since its official release in May and also wrote small blog post earlier this month. 

What's New in vSphere [APP1205]

This is one the sessions I never missed. vSphere is still one of the fundamental technologies for all other transformations. I am interested in finding out what are latest capabilities, the customer challenges and real-world customer successes. 

Automation Showdown: Imperative vs Declarative [CODE2786]

There is no way to miss Luc Dekens and Kyle Rudy take on the hot topic of imperative versus declarative infrastructure and understanding when and how you can and should use each of it and see practical examples of it.

Achieving Happiness: The Quest for Something New [IC1484]

I had the honor to meet Amanda Blevins at VMUG Leaders Summit right before the world decided to  close. Her presentation wowed the crowd and it was one of the highest rated. So this is something that shouldn't be missed, especially since the pandemic has been around for 18 months and we need to achieve some happiness. 

There are hundreds of sessions and the touched areas are so diverse that you can find your picj regardless of your interests in AI, application modernization, Kubernetes, security, network, personal development or plain old virtualization. See you at VMworld 2021

Friday, August 20, 2021

vSphere with Tanzu - Create custom VM template for VM Operator

 We've seen in the previous post how to enable and use VM Operator. We've also noticed that currently there are only 2 VM images that are supported to be deployed using VM Operator. What if we need to create our own image? 

There is a way, but the way is not supported by VMware. So once going this path, you have to understand the risks. 

What is so special about the VM image deployed using VM Operator? It is using cloud-init and OVF environment variables to initialize the VM. 

Let's start with a new Linux VM template. We will install VMware Tools. Then we need to install cloud-init. Once cloud init is installed update the configuration as following:

  • in  /etc/cloud/cloud.cfg check the following value: disable_vmware_customization: true 
    • setting it to true invokes traditional Guest Operating System Customization script based workflow (GOSC); in case it is set to false, cloud-init customization will be used. 
  • create a new file /etc/cloud/cloud.cfg.d/99_vmservice.cfg and add the following line to it network: {config: disabled};
    • this will actually prevent cloud-init to configure the network; you guessed, VMware Tools will be used to configure the network
Before exporting the VM as OVF template, run cloud-init to simulate a clean instance installation. It should be run on subsequent template updates too. 

Next we'll customize the OVF file itself. We need to enable OVF environment variables to be used to transport data to cloud-init. For this to work, I just copied the configuration from VMware CentOS VM service image ovf file and updated several sections: 

In <VirtualSystem ovf:id="vm">, add the following ovf properties. Please note that you could/should change the labels and descriptions to match your template

<ProductSection ovf:required="false">
  <Info>Cloud-Init customization</Info>
  <Product>Linux distribution for VMware VM Service</Product>
  <Property ovf:key="instance-id" ovf:type="string" ovf:userConfigurable="true" ovf:value="id-ovf">
      <Label>A Unique Instance ID for this instance</Label>
      <Description>Specifies the instance id.  This is required and used to determine if the machine should take "first boot" actions</Description>
  <Property ovf:key="hostname" ovf:type="string" ovf:userConfigurable="true" ovf:value="centosguest">
      <Description>Specifies the hostname for the appliance</Description>
  <Property ovf:key="seedfrom" ovf:type="string" ovf:userConfigurable="true">
      <Label>Url to seed instance data from</Label>
      <Description>This field is optional, but indicates that the instance should 'seed' user-data and meta-data from the given url.  If set to '' is given, meta-data will be pulled from and user-data from  Leave this empty if you do not want to seed from a url.</Description>
  <Property ovf:key="public-keys" ovf:type="string" ovf:userConfigurable="true" ovf:value="">
      <Label>ssh public keys</Label>
      <Description>This field is optional, but indicates that the instance should populate the default user's 'authorized_keys' with this value</Description>
  <Property ovf:key="user-data" ovf:type="string" ovf:userConfigurable="true" ovf:value="">
      <Label>Encoded user-data</Label>
      <Description>In order to fit into a xml attribute, this value is base64 encoded . It will be decoded, and then processed normally as user-data.</Description>
  <Property ovf:key="password" ovf:type="string" ovf:userConfigurable="true" ovf:value="">
      <Label>Default User's password</Label>
      <Description>If set, the default user's password will be set to this value to allow password based login.  The password will be good for only a single login.  If set to the string 'RANDOM' then a random password will be generated, and written to the console.</Description>

In <VirtualHardwareSection ovf:transport="iso">, add the following:

<vmw:ExtraConfig ovf:required="false" vmw:key="guestinfo.vmservice.defer-cloud-init" vmw:value="ready"/>

Save the OVF file and export it to the content library. The name must by DNS compliant and must not contain any capital letters. 

Lastly, in the YAML manifest file add disable checks done by VM Operator:

  name: my-vm-name
    app: db-server
  annotations: disable

Tuesday, August 10, 2021

vSphere with Tanzu - VM Operator

VM Operator is an extension to Kubernetes that implements VM management through Kubernetes. It was released officially at end of April 2021 with vCenter Server 7.0 U2a. This is a small feature pushed through a vCenter Server patch that is bringing a huge shift in the paradigm of VM management. It changes the way we are looking at VMs and at the way we are using virtualization. One could argue that Kubernetes already did that. I would say that unifying resource consumption through VMs and pods is actually a huge step forward. VM Operator brings to play not only Infrastructure as Code (IaC), but it also enables GitOps for VMs.

Let's look briefly at the two concepts. IaC represents the capability to define your infrastructure in a human readable language. A lot of tools exist that enable IaC -  Puppet, Chef, Ansible, Terraform and so on. They are complex and powerful tools, some of them used in conjunction with others. All these tools have a particularity: they have their own language - Ruby, Python, HCL. GitOps expands the IaC concept. In this case, Git repository is the only source of truth. Manifests (configuration files that describe the resource to be provisioned) are pushed to a Git repository monitored by a continuous deployment (CD) tool that ensures that changes in the repository are applied in the real world. Kubernetes enables GitOps. Kubernetes manifests are written in YAML. With introduction of VM Operator the two concepts can be used in conjunction. For example you could have a GitOps pipeline that deploys the VMs using Kubernetes manifests and then configuration management tools could actually make sure the VMs are customized to suit  their purpose - deploying an application server, monitoring agents and so on. 

In the current post we will only look at the basics of deploying a VM through VM Operator. Once these concepts are clear then you can add other tools such as Git repositories, CD tools, configuration management. 

So, what do we need to be able to provision a VM through VM Operator? 

We need vCenter Server updated to U2a and a running Supervisor cluster. 

At namespace level a storage policies needs to be configured. It is needed for both VM deployment and persistent volumes 

We need a content library uploaded with a supported VMware template (we will follow soon with a post on how to create unsupported VMware templates for VM operator). At the time of writing CentOS 8 and Ubuntu images are distributed through VMware Marketplace ( search for "VM service Image")

The images are installed with cloud-init and configured to transport user data using OVF environment variables to cloud-init process which in turn customizes the VM operating system. 

In Workload Management, VM Service allows the configuration of additional VM classes and content libraries. VM classes and content library are assigned to the namespace to be able to provision the VMs. 

VM classes selected for a particular namespace:

Content library selected for a particular namespace:

Once or the prerequisites are in place, connect to supervisor cluster, and select the namespace you want to deploy the VM

kubectl vsphere login --verbose 5 --server= --insecure-skip-tls-verify  -u cloudadmin@my.lab

kubectl config use-context my-app-namespace

Check that the VM images in the content library are available 

kubectl get virtualmachineimages

Create the VM manifest file - centos-db-2.yaml 

kind: VirtualMachine
 name: centos-db-2
  app: my-app-db
 imageName: centos-stream-8-vmservice-v1alpha1.20210222.8
 className: best-effort-xsmall
 powerState: poweredOn
 storageClass: tanzu-gold
  - networkType: nsx-t
  configMapName: my-app-db-config
  transport: OvfEnv
apiVersion: v1
kind: ConfigMap
 name: my-app-db-config
 user-data: |
 hostname: centos-db-2

In the manifest file we've added 2 resources:
- VirtualMachine: where we specify the VM template to use, the VM class. storage policy, network type and also how to send variables to the cloud-init inside the VM (using a config map resource to keep the date in Kubernetes and OVF environment variables to transport it to the VM)
- ConfigMap: contains in our case user data (Base64 encoded user data - this is a SSH key) and the hostname of the VM; Base64 output in this post is trunked 

 To create the VM, apply the manifest. Then check its state.
kubectl apply -f centos-db-2.yaml 

kubectl get virtualmachine

Once the VM has been provisioned, it has been assigned an IP from the POD CIDR 

POD CIDRs are private subnets used for inter-pod communication. To access the VM, it needs an Ingress CIDR IP. This is a routable IP and it is implemented in NSX-T as a VIP on the load balancer. The Egress CIDR is used for communication from VM to outside world and it is implemented as SNAT rule. To define an ingress IP, we need to create a virtual machiner service resource of type load balancer:

Create the manifest file - service-ssh-centos-db-2.yaml 

kind: VirtualMachineService
 name: lb-centos-db-2
  app: my-app-db
 type: LoadBalancer
  - name: ssh
    port: 22
    protocol: TCP
    targetPort: 22

We are using the selector app: my-app-db to match the VM resource for this service. The service will be assigned an IP from Ingress network and it will forward all requests coming to that IP on SSH port to the VM IP on SSH port. 

 To create the service, apply the manifest. Then check its state.
kubectl apply -f service-ssh-centos-db-2.yaml 

kubectl get service lb-centos-db-2

The External IP displayed in the above listing is the ingress IP that you can use now to ssh to the VM:
ssh cloud-user@external_ip

Please note the user used to SSH. From it, you can then sudo and gain root privileges. 

A VM provisioned via VM Operator can only be managed through the Supervisor cluster API (Kubernetes API). In this regard, the VM cannot be any longer managed directly from the UI or other management tools. Looking at the picture below you will notice that the VM is marked in UI as "Developer Managed" and that there are no actions that can be taken on the VM

If you are this far, then well done, you've just provisioned you first VM using Kubernetes API. Now put those manifests in a git repo, install and configure a CD tool (such as ArgoCD) to monitor the repo and apply the manifests on the Supervisor cluster and you don't even need to touch kubectl command line or vCenter Server :-) 

Thursday, May 13, 2021

vSphere with Tanzu - Enable Supervisor Cluster using PowerCLI

In previous post we looked at how to manually enable Supervisor cluster on a vSphere cluster. Now we'll reproduce the same steps from GUI in a small script using PowerCLI. 

PowerCLI 12.1.0 brought new cmdlets for VMware.VimAutomation.WorkloadManagement module and one of this is Enable-WMCluster. We will be using this cmdlet to enable Tanzu supervisor cluster. In the following example we'll be using NSX-T, but the cmdlet can be used with distributed switches. 

The following script is very simple .First we need to connect to vCenter Server and NSX manager

Connect-VIServer -Server
Connect-NsxtServer -Server

Next we define the variables (all variable that were in the UI wizard).

The cluster where we enable Tanzu, the content library and the storage policies:

$vsphereCluster = Get-Cluster "MYCLUSTER"
$contentLibrary = "Tanzu subscribed"
$ephemeralStoragePolicy = "Tanzu gold"
$imageStoragePolicy = "Tanzu silver"
$masterStoragePolicy = "Tanzu gold"

Management network info for Supervisor Cluster VMs

$mgmtNetwork = Get-VirtualNetwork "Mgmt-Network"
$mgmtNetworkMode = "StaticRange"
$mgtmNetworkStartIPAddress = ""
$mgtmNetworkRangeSize = "5"
$mgtmNetworkGateway = ""
$mgtmNetworkSubnet = ""
$distributedSwitch = Get-VDSwitch -Name "Distributed-Switch"

DNS and NTP servers

$masterDnsSearchDomain = "my.lab"
$masterDnsServer = ""
$masterNtpServer = ""
$workerDnsServer = ""

Tanzu details - size and external and internal IP subnets

$size = "Tiny" 
$egressCIDR = ""
$ingressCIDR = ""
$serviceCIDR = ""
$podCIDR = ""

One more parameter needs to be provided: Edge cluster ID. For this we use NSX-T manager connectivity and 

$edgeClusterSvc = Get-NsxtService -Name com.vmware.nsx.edge_clusters
$results = $edgeClusterSvc.list().results
$edgeClusterId = ($results | Where {$_.display_name -eq "tanzu-edge-cluster"}).id

Last thing is to put all the parameters together in the cmdlet and run it against the vSphere cluster object

$vsphereCluster | Enable-WMCluster `
-SizeHint $size `
-ManagementVirtualNetwork $mgmtNetwork `
-ManagementNetworkMode $mgmtNetworkMode `
-ManagementNetworkStartIPAddress $mgtmNetworkStartIPAddress `
-ManagementNetworkAddressRangeSize $mgtmNetworkRangeSize `
-ManagementNetworkGateway $mgtmNetworkGateway `
-ManagementNetworkSubnetMask $mgtmNetworkSubnet `
-MasterDnsServerIPAddress $masterDnsServer `
-MasterNtpServer $masterNtpServer `
-MasterDnsSearchDomain $masterDnsSearchDomain `
-DistributedSwitch $distributedSwitch `
-NsxEdgeClusterId $edgeClusterId `
-ExternalEgressCIDRs $egressCIDR `
-ExternalIngressCIDRs $ingressCIDR `
-ServiceCIDR $serviceCIDR `
-PodCIDRs $podCIDR `
-WorkerDnsServer $workerDnsServer `
-EphemeralStoragePolicy $ephemeralStoragePolicy `
-ImageStoragePolicy $imageStoragePolicy `
-MasterStoragePolicy $masterStoragePolicy `
-ContentLibrary $contentLibrary

And as simple as that, the cluster will be enabled (in a scripted and repeatable way).