Sysadmin Stories

Friday, May 1, 2020

vSphere 7 Local Disk Fresh Install and VMFS-L

I just got my new Intel NUC and it was time to install it. So I popped the vSphere 7 USB stick and started installing. Ten minutes later I was looking at the freshly installed system and noticed that the hard drive was much smaller than expected - 337 GB from a 500 GB raw drive. The vSphere 6.7 NUC with the same drive had a capacity of 458 GB. So what happened to 120 GB of space?

Looking at the partition layout the new 120GB VMFSL caught my attention.

Because I am an engineer and I read the manual after the fact, I started reading vSphere 7 storage requirements in the official documentation. The VMFS-L is used as ESX-OSData partition instead of the scratch partition. It stores logs, coredumps and configuration. However I cannot loose 120 GB on each of my NUCs and I am already running vSphere 7 from a USB stick . So two questions came up:

1. How to recover some of the 120GB from VMFS-L

2. How to install vSphere 7

1. How to recover some of the 120GB from VMFS-L

I used the install USB stick to install vSphere 7 on it. However this didn't work since vSphere 7 would find the big VMFS-L partition and use it. Which made removing it not possible.

I also turned off scratch

Then I actually booted up a vSphere 6.7 USB installation. Now I could access the disk and remove VMFS-L partition.

2. How to install vSphere 7

Since I already had vSphere 7 running off a 16 GB USB stick, I figured out that it should be able to run with less capacity (again, it is my home lab, I wouldn't to all these tricks in production). Hence I installed vSphere 6.7 on the local disk and that got me to the following layout:

Then I added the host to vCSA 7.0 and upgraded to vSphere 7 using Lifecycle Manager which got me to a better looking final partition layout. The upgrade uses the existing core dump, locker, and scratch partitions to create the ESX-OSData volume.

Seems that it's better to read first even if you are playing in your lab. In my defense, I had installed vSphere 7 before but it was a nested ESXi and they had a small dedicated boot drive.

For more details on how to change scratch partitions you can also look at the following KB

Monday, April 20, 2020

Tips & Tricks to Install and Upgrade vRA 8.x in a Small Lab

I started building my vRA 8 environment in the home lab and even if the process was a pretty smooth one, working in an environment with limited resources presented challenges that I hope this article will help you overcome easily.

For me it was a 2 step process: install vra 8.0.1 and upgrade to 8.1 within week. So I will treat each step independently. Some of the challenges of installing 8.0.1 will certainly apply to a direct installation of 8.1.

My home lab is made of ESXi hosts running some minimal hardware (4 cores and 32 GB of RAM per host). Requirementes for vRA 8 are as following:
- vRealize Identity Manager (vIDM) - minimum 2 vCPU / 6 GB RAM
- vRealize Lifecycle Manager (vRLCM) - minimum 2 vCPU / 6 GB RAM
- vRealize Automation (vRA) 8.0.1 - 8 vCPU / 32 GB RAM
- vRealize Automation (vRA) 8.1 - 12 vCPU / 40 GB RAM

As you can easily see vRA 8 shouldn't be actually installed on a 32 GB ESXi host. And if I wouldn't have started with 8.0.1, I don't think I would've have even tried to install 8.1. However, I did start with 8.0.1. I also haven't read the system requirements in the beginning..

Installation of vRA 8.0.1

vRA certificate

If you have a green field like I did, then the first thing to install is vRLCM using the easy installer. At step "Identity Manager Configuration" make sure to select "Install New VMware Identity Manager". This will deploy both appliances. Now you can login to vRLM and create a new environment with vRA 8.

vRA8 needs a certificate, even if it's self signed. Go first to Locker - Certificate and generate a new certificate.

Next, fill in all required information about hostname, IP addresses, passwords, the usual stuff. The precheck should also execute successfully. Before launching the deployment, you can save the configuration as a JSON file. I recommend doing it as it may come in handy if you ever want to automate this install.

vRA VM configuration downgrade

The deployment is a multi staged process. In the first stage it will deploy the actual VM from OVF and try to power it on. In my case it failed as it tries to start a 8 vCPU / 32GB VM.

Open vSphere Client and change the settings of the VM - I used 4 vCPU and 30 GB of RAM. I did try with 24 GB, but that ended up with containers not being scheduled due to lack of resources:

In this case I think 30 GB is a decent compromise. Once you have modified the VM, go back to vRLCM and restart the task (Retry request). Take care not to delete the already exiting VM. At this point you only have to wait until all 8 stages are finished.

vRO expired password

Unfortunately, two hours later I had another error, this time during vravainitializecluster step. This is something related to 8.0.1 and it does not happen all the time. So you may not see it.

To confirm it, SSH to vRA VM and look at the log file indicated in the error /var/log/deploy.log and look for database connection error. Run also the following command to check the status of vRO containers: kubectl -n prelude get pods. If the vRO container in CrashLoopBackOff a quick search for "vro container CrashLoopBackOff" will get you to the following KB on new installs of vRealize Orchestrator 8.x failing to install due to a POD STATUS of 'CrashLoopBackOff' (76870). The error is caused by an expired password. Apply the steps in the KB and restart deploy. It picks up where it left and soon vRA is installed and running.

I am curios how a direct install of vRA 8.1 will actually work having in mind the small resource drawback. But even if it doesn't, there is a way to get there.

A few days later 8.1 was released so it was time to upgrade.

Upgrade to vRA 8.1

For a step by step article on upgrading to 8.1 you may look here, As stated above, I will only focus on the small hick ups.

Binary mapping

You need to first upgrade vRLCM and vIDM. Once these two feats are done (again, pretty straight forward thanks to Lifecyle Manager) you will upgrade vRA. I've downloaded the updaterepo ISO file from VMware site and uploaded it to vRLCM to /data (using winscp). Then I created a new binary mapping by going to Settings - Binary Mappings and adding the binary:

Precheck ignore

You can start the upgrade using vRLCM repository. The precheck will fail because this time there is a VM and it looks at its configuration and it does not like seeing 4 CPU and 30 GB of RAM:

Do like I did, ignore the errors and start the upgrade. One hour later you should see something similar to the following, which means the upgrade was successful

Snapshot

Do not forget the upgrade process takes a snapshot of the VM that you need to delete

Now I am running vRA 8.1 in my home lab and it works decently at least the Cloud Assembly part that I've been playing with so far. I do understand that resources are required for good reasons, but you need a 12 core / 64 GB host which is not easy to get. In this case, running it with reduced performance is better than not running it at all. There are obvious impacts on the services. For example the time it takes to boot up and it's impressive that it does it in the end. The following snip is a proof of the struggle behind:

Thursday, April 16, 2020

Veeam Backup & Recovery - Change Block Tracking Reset

Change Block Tracking (CBT) is a feature that allows tracking of changed disk sectors. Tracking for virtual disks is done in the virtualization layer. CBT is exposed through vSphere APIs for Data Protection (VADP) to 3rd party applications.

VBR uses this feature to track changes between incremental backups and make those backups faster. Instead of reading the whole vmdk, it will ask and receive only the changed blocks from the last incremental backup. How do you know CBT is enabled and used?

From VMware point of view, you will see for each vmdk a file with the vmdk_name-ctk.vmdk

From Veeam point of view, you will see in the backup job statistics next to the disk details CBT. Let's look at what actually happens in the backup job.

CBT on, active full backup, read 20 GB

CBT on, incremental backup, read 11 MB

CBT off, incremental backup, read 20 GB

This is CBT in action. The first picture is an active full, so the whole disk has been read. The second one is an incremental where only changes from last backup have been read. When CBT was disabled, the second incremental also read 20GB. Now extrapolate this to hundreds of VMs to understand the importance of CBT: less data read means not only a shorter backup window, but also less load on the production storage.

Luckily, CBT is enabled by default for all newly created backup jobs. It can be found in the vSphere integration tab, in Storage > Advanced.

CBT is not supported for physical mode RDMs and if a VM has snapshots when it's activated the first time. Sometimes CBT can get corrupted and the only way to solve it is to reset CBT. This is not easily done A new feature in VBR v10 allows to automatically reset CBT on all VMs when after an active full backup is executed. Remembering how many times I heard support guys advising a CBT reset, I think this is a cool feature to add. The CBT reset action is caught in the backup stats window

The downside is that the active full backup will take a bit longer, but you will be protected against potential CBT corruptions.

Monday, April 6, 2020

Upgrading vCSA 6.7 to vCSA 7.0

First thing, first: backup vCSA 6.7. Use a backup solution or do a vCenter backup from VAMI. It's never bad to have one, even though you will see that it can be skipped.

Next, mount the ISO and start the installer UI. The UI presents the already known options: install, upgrade, migrate, restore. My target is to upgrade existing lab environment so, I am choosing upgrade. Before going further into the post, I want to clarify the upgrade process. It is not an in place upgrade, it is actually a migration and you will end up with a new VM running vCSA 7.0 alongside the old vCSA 6.7.

I don't intend to describe the step by step upgrade since there are already a few good blog posts and the process itself is pretty straight forward. I will however highlight some things I came across during the upgrade.

you will end up with 2 VMs - a powered off old vCSA 6.7 and a powered on vCSA 7.0
vCSA 7.0 will preserve in the end FQDN and IP of vCSA 6.7
a temporary IP address is required for vCSA 7.0 to be used during the data migration from vCSA 6.7 (that moment in time when both VMs are up and running)
it's a 2 stage process: first a new vCSA VM is deployed, then the data is migrated
migration offers possibility to chose how much data you actually want transferred (the more you choose, the longer it takes)

configuration and inventory
configuration, inventory, tasks and events
configuration, inventory, tasks, events and performance metrics

the whole process took almost 2 hours (my lab), expect bigger times for real environments

During the upgrade itself I got a couple of warning at pre-check and a couple of notifications at the end for the rest being smooth and uneventful. The warning was about not having DRS enabled on my cluster, which was fine because I had a single node cluster where vCSA was running:

The notifications is about TLS 1.0 and 1.1 being disabled and Auto-Deploy needing an update:

I did try to upgrade using CLI installer, however there are some issues with the upgrade templates and its schema in the GA version (15843807) and it kept on failing during JSON template precheck. I will come back to this topic once I figure it out.

Saturday, April 4, 2020

Veeam Backup and Replication v10 - General Options

I finally got v10 up and running in my lab. I started randomly clicking around the interface and noticed a couple of changes in the General Options menu. I think it's one of the most ignored part of Veeam Backup & Replication while it has a lot of important settings. So I decided to pay it the proper respect and write a post about it. Just a heads up: if you are interested in VBR notifications, alerts, audits, reports, then this it where you should look.

I/O Control
By default it is disabled. However, if you are running in a production environment and backups may overlap with any business hours, this should be enabled. It ensure backups are not going to impact the production datastores by imposing latency threshold and throttling down backup jobs. The thresholds should be set accordingly with the underlying physical storage and current production latency.

Notifications
This tab is enabled by default. Besides license and update related notifications, there are 3 settings regarding notifications on space usage:

warn when backup storage space is less than 10%
warn when production storage is less than 10% - this will cause your backup jobs to finish with a warning
skip VM processing when free disk space is below 5% - in this case a backup job will fail for the VMs on those datastores

As you can see these notifications are not only simple notifications, but also determine the result of backup jobs. You can tweak the default settings, but it is highly recommended to leave enough free space on datastores for snapshots to grow during a backup.Otherwise you may run into an out of space situation.

Security

This is used to change VBR SSL certificate (default is self signed), set up trust for Linux discovered hosts and it also has a new setting that appeared in v10: audit logs location.

The audit logs are used to track file level restore activities. By changing the location to a centralized repository or even a WORM (write once read many) device, you can ensure a proper audit trail and archival. In case you run a file level restore, the following information will be logged in a CSV format file: Time, User, SID, Operation, Result, Object.

29.03.2020 18:35:51Z, MYLAB\Administrator, S-1-5-21-AAAAAAA-XXXXXXX-NNNNNNN-500, Restore, Success, C:\Users\Administrator.MYLAB\Desktop\order_form_7393 VMUG Feb 2019.pdf

However there is a workaround - you can open a file directly from backup in explorer and read its content. This activity will not be logged.

E-mail Settings and SMNP Settings

These two tabs are dedicated to reporting over e-mail and alerting over SNMP. When enabling e-mail reporting you may choose the level of granularity. By default you will receive e-mails for any job status: success, warning or fail.

For SNMP you may configure up to 5 different receivers, as well as their community string and listening port.

Session history

Session history configure the number of session to display and the retention period. Up to 9.5U4 the default retention period was 53 weeks. With v10 this changed to 13 weeks.

From an operational point of view, the most important settings are related to I/O control, notifications and e-mail/SNMP while from auditing point of view, the audit log and the history retention period.

Thursday, April 2, 2020

vCenter Server Appliance 7.0 Command Line Installer

One of my favorite features in vSphere is command line install of vCenter Server. It first appeared with vCenter Server 6.0. It is based on a JSON file input and can be use to do a fresh install of vCSA 7.0, upgrade an existing vCSA 6.5 or 6.7 installation to 7.0 or migrate a Windows vCenter Server 6.5 or 6.7 to vCSA 7.0.

The installer can be run from Windows, Linux or Mac. To access it, you need the vCSA iso file and locate folder vcsa-cli-installer\win32 (Windows users). JSON templates are found in templates folder. You need to modify the JSON template that fits your use case. I will do a fresh install of vCSA 7.0 in my lab so I will be using the template embedded_vCSA_on_VC.json which deploys the new vCSA inside an existing vCenter Server. The template is commented very well, however I will post here an example of what a simple configuration looks like. Please be aware that this is just a snippet of the actual template and some parts have been left out for ease of reading.

    "new_vcsa": {
        "vc": {
            "hostname": "vcsa67.mylab.com",
            "username": "administrator@mylab.com",
            "password": "",
            "deployment_network": "VM Network",
            "datacenter": [
                "VDC-1"
            ],
            "datastore": "DATASTORE-1",
            "target": [
                "CLUSTER-1"
            ]
        },
        "appliance": {
            "thin_disk_mode": true,
            "deployment_option": "small",
            "name": "vcsa70"
        },
        "network": {
            "ip_family": "ipv4",
            "mode": "static",
            "system_name": "vcsa70.mylab.com",
            "ip": "192.168.100.1",
            "prefix": "24",
            "gateway": "192.168.100.254",
            "dns_servers": [
                "192.168.1.10"
            ]
        },
        "os": {
            "password": "",
            "ntp_servers": "0.ro.pool.ntp.org",
            "ssh_enable": true
        },
        "sso": {
            "password": "",
            "domain_name": "vsphere.local"
        }
    }

As you can see, once you create the template it can reused a lot of times. What for you may ask and one answer is nested labs. If you are unsure what size the vCSA should be, the installer will tell you:
.\vcsa-deploy.exe --supported-deployment-sizes

The installer takes different parameters besides the JSON file:
.\vcsa-deploy.exe install --accept-eula [--verify-template-only|--precheck-only][file_path_to_json]

If you want to automatically accept SSL certificate thumbprint, you can add --no-ssl-certificate-verification parameter.

As seen above, the installer comes with 2 options that enable you to check that everything is fine before actually starting the install:

verify-template-only - will run a JSON file verification to validate the structure and input parameters (e.g. password strength, IP address, netmask). The final check result is displayed along with the path to the log file. The log file contains all required details. For example if you typed an IP address that does not exist, the following message is displayed in log file:

2020-03-27 19:44:06,232 - vCSACliInstallLogger - ERROR - The value '192.268.100.1' of the key 'ip' in section 'new_vcsa', subsection 'network' is invalid. Correct the value and rerun the script.

precheck-only - will do a dry run of the installer. This time it will connect to vCenter server and check that the environment values are actually correct: for example that you don't have another VM with the same name, vCenter objects are correct (datacenter, datastore, cluster or host). It also does a ping test to validate the IP/FQDN entered for the new vCSA are available.

================ [FAILED] Task: PrecheckTask: Running prechecks. execution

Error message: ApplianceName: A virtual machine with the name 'vcsa70' already

exists on the target ESXi host or cluster. Choose a different name for the

vCenter Server Appliance (case-insensitive).

Of course, you don't have to run both checks or even any check if you are confident enough. For me, precheck-only helped since I didn't understand how to fill in the JSON file from the first time (I will blame it on a barrier language). One very important aspect of installing is to have DNS records setup and working. If you don't, even if the prechecks and the actual install will work, first boot of vCSA will most likely fail.

Having all setup up and checked, you just run the install command and that's it. I like the CLI installer because it is simple, powerful and repeatable. No more filling in fields in a GUI and waiting for the lines on the screen.

Saturday, March 21, 2020

vROps Custom Dashboard for Monitoring vRealize Automation Reservations

It's been a while since I last tried to create a custom dashboard in vRealize Operations Manager (vROps). I think it was called vCenter Operations Manager at that time and the version was 5.8. Fear not, in today's post we are talking about vROps 7.5.

The use case is pretty simple: I need a way of monitoring the capacity of the reservations in terms of memory and storage. The management pack for vRA is tenant and business group focused, which doesn't really apply in my case where I have only one tenant and multiple business groups using the same reservation.

The way the dashboard is being organized as following:

Top level is an object list widget that gets automatically updated by any selection in the other dashboards. Main information is displayed in Top-N widgets that show top 15 most utilized reservations in terms of storage and memory. On the right side I've added 2 heatmap widgets for the same metrics - allocated % storage and memory per reservation. However the heatmaps present all reservations and their size is relative to the size of the reserved resource. The bigger the drawn size, the bigger the reserved value is. Any interaction with the Top-N or Heatmap widgets will provide more details in the Object List. The interactions view has been added to vROps somewhere in 2018 and it's a great and simple way to create interactions between widgets.

How the dashboard works: let's say we have a reservation that is 90% memory utilized displayed in Top-N widget. When selected, the Object List on top will get populated with the reservation details: which vSphere cluster is being mapped to the reservation, how much memory is actually allocated for that vSpher cluster in vRA and how much physical memory in total the cluster has. Kind of a drill down into the situation. Of course, being in vROps you can further drill down on the vSphere cluster.

In this case the selected reservation is at 81% memory usage. The top widget displays the real value - which is less than 400 GB. The heatmap on the right can be used to analyse the overall situation. Don't forget the bigger the reservation size is, the bigger the size in the heatmap. While in the Top-N list we are actually filtering the data and selecting only the ones that are critical.

Let's take a deeper look into how each widget type is configured:

Reservation Usage - Object List widget

Configuration is selected as Self Provider off since it receives data from other widgets. We add additional columns to display in the widget such as mapped cluster, free memory.

To add columns, press the green plus and filter by adapter type and object type

I've also removed the widgets default columns from the view since I am not interested in collection state, collection status.

VRA Reservation Memory Allocated % - Top-N widget

For this widget, select Self Provider On in configuration section. Aslo select Top-N Options as Metric analysis and Top Highest Utilisation. Enable auto refresh if you want to update the metric data.

Once self provider is set to on, Input Data section becomes active and here add all your reservations. Data will be analysed from all reservation and only first 15 will be displayed based on the criteria selected in Output Data section.

In Output Data, select the object type as reservation and then the metric to be processed as Memory Allocated %.

Lastly, we can add Output Filters. Let's say we don't want to see a top 15 of all the reservations' memory usage, but only the ones that are above a certain threshold like 75%. We also do not want to see in there reservations that are above the set threshold, but because they are very big they actually have sufficient resource, more than 1TB of RAM for example. In this case we would add a filter on the output data that limits the displayed info:

Memory Allocated % - Heatmap widget

For heatmap we use the same input data as for the Top-N: all reservation. What changes is the Output Data. We'll group the information by Reservation (but it can be Reservation Policy, or tenant or whatever grouping it suits)

Next we select Reservation as the object type. The metrics used for Size by and Color by are different since I wanted to have a representation of how big is the VRA reservation and also of its usage. The bigger the reserved memory size, the bigger the box will be drawn. The more used the reservation is, the darker the color will be.

Output filter can be used here also, for example if you are not interested in very small reservation or want to filter out some of them based on the naming (reservations for a test environment). Putting a little extra time to tweak the widgets to your requirements and environment will prove beneficial since the visualized data makes sense to different users based on their needs.