Thursday, February 6, 2020

Migrate vRealize Orchestrator 7.3 Cluster with External Database to 7.6

In every environment's life comes a time when you need to upgrade. So it did happen for our vRO setup. The source environment is running vRO 7.3.1 in a 2-node cluster and with and external MSSQL database. The target environment is a 2-node cluster running vRO 7.6 and internal PostgreSQL (since no external is supported).

Because source vRO is connected to external MSSQL server, in place upgrade is out of the question. So a migration is actually the only way to do it. A constraint for the migration is not to change hostnames or IP addresses. Well, the requirement is actually to get everything back to 7.6 with a minimum of reconfiguration and here we talk about plugins, workflows, actions, configuration elements, certificates and all that jazz.

Having laid down the task ahead we started working on the plan. And what we came up with was pretty interesting. 

vRO 7.6 appliance offers a migration tool that is accessible through VAMI. The tool needs access to source vRO and database. Since we have a 2-node cluster, we'll take advantage of that and do a rolling migration. Source vRO 7.3 cluster is made of nodes: node1 and node2. Target vRO 7.6 cluster will be running the same vRO nodes.

So how does this rolling migration look like:

In initial state there is a vRO 7.3.1 cluster with external MSSQL database. We start by redirecting traffic in load balancer to node1 and disabling monitoring on the pool. Then we break the cluster by removing node2 and powering it off. Next we deploy a new node2 running 7.6 and using the same hostname and IP address. If you want to reuse the same VM name, you just need to rename the old node in vCenter Server.

This takes us to intermediary state where we have one node1 in 7.3.1 and node2 in 7.6 with and embedded PostgreSQL DB. Once node2 in 7.6 is up an running, we connect to VAMI and fire up the migration tool. Migration tool will transfer all data from node1 and the external DB to node2. After vRO is migrated to 7.6, we proceed to power off node1 running 7.3.1 and deploy node1 in 7.6. Don't forget to switch over traffic to node2 in load balancer.

Now we are in final state where both nodes are running 7.6. Once node1 is up, we add it to the cluster that node2 is already member. The cluster is back and this time running vRO 7.6 with embedded PostgreSQL database. Simple, right?

Well, there are a few of gothcas:
- when running the migration tool, a pre-check is run that can fail if the vRO DB contains duplicates; it can be fixed by either removing duplicates from DB or connecting with vRO client to source vRO and renaming them
- SSL certificates are not migrated - you need to connect to source vRO keystore and export the certificates, then re-add them in control center
- there is a cumulative update that needs to be applied to the freshly deployed vRO 7.6
- dynamic types plugin configuration is not migrated - there is an updated plugin that needs to be installed
- in some cases you may need to update the appliance hardware including file system

Lastly enable traffic in load balancers to both nodes and re-enable health monitors. Do that after importing SSL certificate.

Happy migrations!

Thursday, December 26, 2019

Check ESXi MTU settings with PowerCLI

Sometimes the simplest tasks can be time consuming in large environments. This time is about MTU settings on the ESXi host,

First let's see a bit about MTU (Maximum Transmission Unit). It is a setting that defines the largest protocol data unit that can be sent across a network (largest packet or frame). Default setting is 1500 bytes. Having a bigger MTU increases the performance for particular uses cases as large amounts of data transmission over Ethernet. So it's always been set to larger values (9000 bytes) for accessing iSCSI and NFS storage. For a vSphere environment it means it could (and should in some cases) be increased for almost all types of traffic: vMotion, vSAN, Provisioning, FT, iSCSI, NFS, VXLAN, FCoE.

Let's take the use case of accessing a NFS datastore, as seen in the picture below:

The biggest challenge with MTU is to have the environment properly configured end-to-end.This means when you want your ESXi to make use of large MTU for accessing a NFS datastore, you'll need to make sure that distributed virtual switches, physical network interfaces, vmkernel portgroups, physical switches at system level and per port level and filers are configured with the proper MTU. What happens in our example when some elements are configured with the default MTU (1500)? In the case the vmkernel portgroup is set at 1500, then no you will see no performance benefits at all. If one of the physical switches is configured with 1500 bytes, then you will get fragmentation of the packets (performance degradation).

Hoping this short theoretical intro to MTU was helpful, I will jump ahead to the topic: checking ESXi MTU with PowerCLI. We are not treating how to check physical switches and storage devices within present article.

At ESXi level we need to check 3 settings: distributed virtual switch, physical network interfaces (vmnic used for uplinks) and vmkernel portgroups. To accomplish this we make use of two different PowerCLI cmdlets: Get-EsxCli and Get-VMHostNetworkAdapter

The beauty of Get-EsxCli is that it exposes esxcli commands and you can access the host through vCenter Server connection (no root or direct login to the ESXi host is required). The not so nice part its you have to use esxcli syntax in PowerCLI as you will soon see.

Main checks

We will first look at the script checks. Please keep in mind $h is a variable initialized with Get-VMHost cmdlet.
  • distributed virtual switch - will get the switch name, configured MTU and used uplinks; the loop ensures all dvswitches are checked
(Get-EsxCli -VMHost $h -V2).network.vswitch.dvs.vmware.list.Invoke() | foreach {
    Write-Host "DVSName  $($_.Name) MTU  $($_.MTU) UPLINKS  $($_.Uplinks)"

  • vmnics - check configured MTU, admin and link status for each interface (there is no issue in having unused nics configured differently) 
(Get-EsxCli -VMHost $h -V2).network.nic.list.Invoke() |  foreach {
    Write-Host "NIC:"$_.Name "MTU:"$_.MTU "Admin:"$_.AdminStatus "Link:"$_.LinkStatus

  • vmkernel portgroups - check configure MTU and IP address
$vmks = $h | Get-VMHostNetworkAdapter | Where { $_.GetType().Name -eq "HostVMKernelVirtualNicImpl" }
foreach ($v in $vmks) {
    Write-Host "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"

The script

Putting it all together, we'll add the three checks in a foreach loop. The loop iterates through all the clusters and within each cluster through all the hosts. The script creates one log file per cluster containing all the hosts in that cluster and their details:

foreach ($cls in Get-Cluster) {
    $fileName = $cls.Name + ".log"
    Write-Host "# CLUSTER $($cls)" -ForegroundColor Yellow
    foreach ($h in $cls |   Get-VMHost) {
        Write-Host "$($h)" -ForegroundColor Yellow
        Add-Content -Path $fileName -Value "$($h)"

        (Get-EsxCli -VMHost $h -V2).network.vswitch.dvs.vmware.list.Invoke() | foreach {
            Write-Host "DVSName  $($_.Name) MTU  $($_.MTU) UPLINKS  $($_.Uplinks)"
            Add-Content -Path $fileName -Value "DVSName $($_.Name) MTU $($_.MTU) UPLINKS $($_.Uplinks)"
        (Get-EsxCli -VMHost $h -V2).network.nic.list.Invoke() |  foreach {
            Write-Host "NIC:"$_.Name "MTU:"$_.MTU "Admin:"$_.AdminStatus "Link:"$_.LinkStatus
            Add-Content -Path $fileName -Value "NIC: $($_.Name) MTU: $($_.MTU) Admin: $($_.AdminStatus) Link: $($_.LinkStatus)"
        $vmks = $h | Get-VMHostNetworkAdapter | Where { $_.GetType().Name -eq "HostVMKernelVirtualNicImpl" }
        foreach ($v in $vmks) {
            Write-Host "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
            Add-Content -Path $fileName -Value "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"

Opening one of the log files, you see similar output to below:
DVSName dvs-Data1 MTU 9000 UPLINKS vmnic3 vmnic2 vmnic1 vmnic0
NIC: vmnic0 MTU: 9000 Admin: Up Link: Up
NIC: vmnic1 MTU: 9000 Admin: Up Link: Up
NIC: vmnic2 MTU: 9000 Admin: Up Link: Up
NIC: vmnic3 MTU: 9000 Admin: Up Link: Up
VMK vmk0 MTU 9000 IP
VMK vmk1 MTU 9000 IP
VMK vmk2 MTU 9000 IP

In this case everything looks good at ESXi level. Easy part is over, so start digging in the physical switches CLI and any other equipment along the path to ensure end to end MTU consistency.

Saturday, November 30, 2019

PBKAC and Nested ESXi VSAN

This post is a short one about a PBKAC (Problem Between Keyboard and Chair) and how I discovered it while setting up a nested VSAN cluster in my home lab.

I've recently decided to install a nested VSAN environment. I've created 4 nested ESXi hosts running 6.7 U3, added disks to them (flash disks) and tried to configure VSAN. And it was not smooth at all.

First, I've noticed I couldn't add flash devices as capacity tier. Disks were seen only when I manually marked them as HDD in vCenter Server. Even after doing this, the "NEXT" button was grayed out and couldn't complete the cluster creation.

Being a stubborn engineer, I did not read the manual, but I've dropped the UI and went all the way esxcli. Created the VSAN cluster in command line with no luck this time also. The cluster ended up in a split cluster scenario while UI showed confusing health messages.

Decided there is something wrong with my 6.7 U3 image (?!?) and Sunday coming to an end, I've temporarily postponed the project. Restarted it a few days later to find out that I was using vCenter Server 6.7 build 13639324 (U2a).So yeah, vCenter Server 6.7 U2a and VSAN 6.7 U3a doesn't sound like a good match. Immediately, I've updated vCenter Server to U3a and VSAN cluster creation took 5 minutes.

It wasn't a waste of time, as I've dusted off some powershell and I've learnt a valuable lesson: always check the version of your components. But it was definitely a PBKAC.

Saturday, November 9, 2019

Surprises at VMworld 2019

It's been an interesting year where VMware announced a ton of new and exciting stuff from projects Pacific and Tanzu to integrating Carbon Black to almost all of the products. Hence this VMworld has been one of the best so far: kubernetes, containers, integrated security, even more networking, infrastructure as a code, terraform, ansible, machine learning... almost forgot, vSphere on ARM. A lot of words and talks not in the traditional virtualization admin's vocabulary. Things are changing and doing it fast. Looking at how VMware redefined itself, caught up with missed trains and how it will lead some of those trains, I think we are going through some of most radical shifts in IT and I personally feel like I've missed at least one train.

I am an old style IT guy: started installing servers in racks and learning by heart RJ45 cross and direct cabling schemes. When cloud came along, it was not very difficult to apply what I had learnt in datacenters. But I stayed somehow focused on the private and sometimes hybrid cloud. Never went fully cloud native, never gotten knee deep into deploying and operating applications in the public cloud. Naturally, DevOps went from hype to mainstream. And as time passed by, machine learning came along and AI started doing crazy stuff like beating humans at League of Legends (I did play a lot of DotA back in the days and not a single game of Go :-) ) A lot happened in the last 10 years.

I've been going to VMworlds since 2013 and kind of grew up with it. Still, this year I felt like I've been taken a bit by surprise. VMworld is, as always, a place to meet old and new friends and a place to learn about new trends. Barcelona adds to that whole vibe of the event. It was a blast to see people I've worked with on different projects years and years ago or to meet my fellow VMUG leaders.

As for the technical part:
- vSphere on ARM looks promising - comparing to last year there were several devices installed with ESXi  from tablets, to network cards to high performance ARM 2U rackable servers
- machine learning and Watson applied to log monitoring and analysis
- vRA 8 redesign - container based, no Windows
- NSX-T everywhere - in your DC, in the cloud and  with a single pane of glass
- can't wait for the next vSphere release

Solution exchange walk was also interesting:
- if AWS presence was expected, having Azure and Oracle (although the latter was a bit hidden) was cool
- most of the traditional networking/security vendors were not present anymore
- LG and Samsung had pretty big stands
- a lot of backup vendors old and new
- ComputerWeekly was also there with a stand
- Penetration Testing as a Service - a platform from where you could hire pen testers to do some... pen testing
- VMware's booth is getting bigger and bigger

My take away from last week is I have to do a lot of reading and learning, way more than the usual, but this is what makes IT interesting.

Sunday, September 29, 2019

Troubleshoot WordPress Connectivity Error to Database

I am an irregular user of Linux. For some demo scenario, I needed to install WordPress. All good and pretty straight forward. I've deployed two CentOS VMs, installed MySQL on one (DB layer) and Apache and PHP on the the other (front end). I've configured MySQL, created DB, added users. I've also tested connectivity from front end to the DB layer. I've also installed WordPress, but when connecting to wp-admin site, I got a connection error message to the DB.

Weird, since I could actually connect to the DB using mysql client and the same credentials.
So it is not a connection error or a problem with the DB.

Since mysql client from front end works, I've tested php connectivity. For this I've put together the following small script in a file called conn.php and put the file in /var/www/html/wordpress/

try {
    $dbh = new PDO('mysql:host=IP_ADDR;port:3306;dbname=WP_DB_NAME','USER', 'PASS');
    print "Connected succefully";
catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";

Next I opened the browser and executed the php script which resulted in an expected connection error:

At this point I just run the same script, but this time interactively from php command line:

And I got a less expected connection successful message. By default SELinux is enabled and it will not allow httpd to talk to the mysql service. A quick test is to temporarily disable SELinux:
setenforce 0

and running again the script will connect successfully: 

So the main culprit is SELinux. Going forward there are 2 options: disable SELinux permanently (which is acceptable in demo labs) or enable Apache to connect to other service by changing the SELinux policy (which will be another post)

Thursday, September 19, 2019

My First AWS Exam

Recently I took AWS Solution Architect Associate exam. It was one of my oldest dreams and somehow a personal challenge. I have been involved at some point in my career with integrating vRealize Automation and AWS. So I got a bit of experience working with EC2 instances, VPCs, ELBs. But then I changed project and position and I did not touch AWS for a while. Recently I've restarted to work with AWS but this time more with S3, IAM and Lambda functions. So I've decided to give it a try and get certified.

I've prepared for the exam taking training course. The course is well structured and educative. I have enjoyed the hands on labs (especially the Alexa skill building one). Practice tests will come in handy, but the course feels a bit insufficient on these. So if you choose to go through more practice tests, whizlabs can be one option. It's not a brain dump and the questions will help you get more knowledge about AWS services as well as get used to the scenario based type of questions.  Another step is read the FAQ for the services in the exam blueprint. And lastly you need to have hands-on experience - so get your hands dirty.

The exam was tougher than I expected. At some point I was thinking that I will not pass (yes, I did pass). Previous IT experience helps in choosing the right answer when you actually don't know it. You also need to read very carefully the question, which being a scenario based question can be harder to understand for English speakers like me. That's why I prefer to read it out loud (it helps being alone in the exam room).

All in all, it was a good experience and I am interested in trying my forces with another AWS exam.

Tuesday, August 27, 2019

Create vCenter Server Roles Using PowerCLI - Applied to Veeam Backup & Replication

Security is important and having a minimal set of permissions is a requirement, not an option. Having this in mind (and being asked a few times by customers), I put together a short script that will create the vCenter Server roles required by Veeam Backup & Replication service account and Veeam ONE service account. The two accounts have different requirements, with Veeam ONE being the most restrictive as it needs mostly read only.

The script itself is pretty straight forward, the more time consuming is getting the privilege lists. So here you are:

And now for the actual scripting part:

$role = "Veeam Backup Server role"
$rolePrivilegesFile = "veeam_vc_privileges.txt"
$vCenterServer = "your-vcenter-server-FQDN"
Connect-VIServer -server $vCenterServer
$roleIds = @()
Get-Content $rolePrivilegesFile | Foreach-Object{
    $roleIds += $_
New-VIRole -name $role -Privilege (Get-VIPrivilege -Server $vCenterServer -id $roleIds) -Server $vCenterServer

The script will create a new vCenter Server role assigning it privileges from the file given as input.

If you ever require to get the privileges from vCenter Server then the next piece of code will help (thanks to VMware communities)

$role = "VBR Role"
Get-VIPrivilege -Role $role | Select @{N="Privilege Name";E={$_.Name}},@{N="Privilege ID";E={$_.ID}}

You will use the privilege ID format for creating the new role.