Sysadmin Stories: 2019

Thursday, December 26, 2019

Check ESXi MTU settings with PowerCLI

Sometimes the simplest tasks can be time consuming in large environments. This time is about MTU settings on the ESXi host,

First let's see a bit about MTU (Maximum Transmission Unit). It is a setting that defines the largest protocol data unit that can be sent across a network (largest packet or frame). Default setting is 1500 bytes. Having a bigger MTU increases the performance for particular uses cases as large amounts of data transmission over Ethernet. So it's always been set to larger values (9000 bytes) for accessing iSCSI and NFS storage. For a vSphere environment it means it could (and should in some cases) be increased for almost all types of traffic: vMotion, vSAN, Provisioning, FT, iSCSI, NFS, VXLAN, FCoE.

Let's take the use case of accessing a NFS datastore, as seen in the picture below:

The biggest challenge with MTU is to have the environment properly configured end-to-end.This means when you want your ESXi to make use of large MTU for accessing a NFS datastore, you'll need to make sure that distributed virtual switches, physical network interfaces, vmkernel portgroups, physical switches at system level and per port level and filers are configured with the proper MTU. What happens in our example when some elements are configured with the default MTU (1500)? In the case the vmkernel portgroup is set at 1500, then no you will see no performance benefits at all. If one of the physical switches is configured with 1500 bytes, then you will get fragmentation of the packets (performance degradation).

Hoping this short theoretical intro to MTU was helpful, I will jump ahead to the topic: checking ESXi MTU with PowerCLI. We are not treating how to check physical switches and storage devices within present article.

At ESXi level we need to check 3 settings: distributed virtual switch, physical network interfaces (vmnic used for uplinks) and vmkernel portgroups. To accomplish this we make use of two different PowerCLI cmdlets: Get-EsxCli and Get-VMHostNetworkAdapter

The beauty of Get-EsxCli is that it exposes esxcli commands and you can access the host through vCenter Server connection (no root or direct login to the ESXi host is required). The not so nice part its you have to use esxcli syntax in PowerCLI as you will soon see.

Main checks

We will first look at the script checks. Please keep in mind $h is a variable initialized with Get-VMHost cmdlet.

distributed virtual switch - will get the switch name, configured MTU and used uplinks; the loop ensures all dvswitches are checked

(Get-EsxCli -VMHost $h -V2).network.vswitch.dvs.vmware.list.Invoke() | foreach {
    Write-Host "DVSName  $($_.Name) MTU  $($_.MTU) UPLINKS  $($_.Uplinks)"
}

vmnics - check configured MTU, admin and link status for each interface (there is no issue in having unused nics configured differently)

(Get-EsxCli -VMHost $h -V2).network.nic.list.Invoke() |  foreach {
    Write-Host "NIC:"$_.Name "MTU:"$_.MTU "Admin:"$_.AdminStatus "Link:"$_.LinkStatus
}

vmkernel portgroups - check configure MTU and IP address

$vmks = $h | Get-VMHostNetworkAdapter | Where { $_.GetType().Name -eq "HostVMKernelVirtualNicImpl" }
foreach ($v in $vmks) {
    Write-Host "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
    }

The script

Putting it all together, we'll add the three checks in a foreach loop. The loop iterates through all the clusters and within each cluster through all the hosts. The script creates one log file per cluster containing all the hosts in that cluster and their details:

foreach ($cls in Get-Cluster) {
    $fileName = $cls.Name + ".log"
    Write-Host "# CLUSTER $($cls)" -ForegroundColor Yellow
    foreach ($h in $cls |   Get-VMHost) {
        Write-Host "$($h)" -ForegroundColor Yellow
        Add-Content -Path $fileName -Value "$($h)"

        (Get-EsxCli -VMHost $h -V2).network.vswitch.dvs.vmware.list.Invoke() | foreach {
            Write-Host "DVSName  $($_.Name) MTU  $($_.MTU) UPLINKS  $($_.Uplinks)"
            Add-Content -Path $fileName -Value "DVSName $($_.Name) MTU $($_.MTU) UPLINKS $($_.Uplinks)"
        }
        (Get-EsxCli -VMHost $h -V2).network.nic.list.Invoke() |  foreach {
            Write-Host "NIC:"$_.Name "MTU:"$_.MTU "Admin:"$_.AdminStatus "Link:"$_.LinkStatus
            Add-Content -Path $fileName -Value "NIC: $($_.Name) MTU: $($_.MTU) Admin: $($_.AdminStatus) Link: $($_.LinkStatus)"
        }
        $vmks = $h | Get-VMHostNetworkAdapter | Where { $_.GetType().Name -eq "HostVMKernelVirtualNicImpl" }
        foreach ($v in $vmks) {
            Write-Host "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
            Add-Content -Path $fileName -Value "VMK $($v.Name) MTU $($v.MTU) IP $($v.IP)"
         }
    }
}

Opening one of the log files, you see similar output to below:

esx-01.rio.lab
DVSName dvs-Data1 MTU 9000 UPLINKS vmnic3 vmnic2 vmnic1 vmnic0
NIC: vmnic0 MTU: 9000 Admin: Up Link: Up
NIC: vmnic1 MTU: 9000 Admin: Up Link: Up
NIC: vmnic2 MTU: 9000 Admin: Up Link: Up
NIC: vmnic3 MTU: 9000 Admin: Up Link: Up
VMK vmk0 MTU 9000 IP 192.168.10.11
VMK vmk1 MTU 9000 IP 192.168.20.11
VMK vmk2 MTU 9000 IP 192.168.30.11

In this case everything looks good at ESXi level. Easy part is over, so start digging in the physical switches CLI and any other equipment along the path to ensure end to end MTU consistency.

Saturday, November 30, 2019

PBKAC and Nested ESXi VSAN

This post is a short one about a PBKAC (Problem Between Keyboard and Chair) and how I discovered it while setting up a nested VSAN cluster in my home lab.

I've recently decided to install a nested VSAN environment. I've created 4 nested ESXi hosts running 6.7 U3, added disks to them (flash disks) and tried to configure VSAN. And it was not smooth at all.

First, I've noticed I couldn't add flash devices as capacity tier. Disks were seen only when I manually marked them as HDD in vCenter Server. Even after doing this, the "NEXT" button was grayed out and couldn't complete the cluster creation.

Being a stubborn engineer, I did not read the manual, but I've dropped the UI and went all the way esxcli. Created the VSAN cluster in command line with no luck this time also. The cluster ended up in a split cluster scenario while UI showed confusing health messages.

Decided there is something wrong with my 6.7 U3 image (?!?) and Sunday coming to an end, I've temporarily postponed the project. Restarted it a few days later to find out that I was using vCenter Server 6.7 build 13639324 (U2a).So yeah, vCenter Server 6.7 U2a and VSAN 6.7 U3a doesn't sound like a good match. Immediately, I've updated vCenter Server to U3a and VSAN cluster creation took 5 minutes.

It wasn't a waste of time, as I've dusted off some powershell and I've learnt a valuable lesson: always check the version of your components. But it was definitely a PBKAC.

Saturday, November 9, 2019

Surprises at VMworld 2019

It's been an interesting year where VMware announced a ton of new and exciting stuff from projects Pacific and Tanzu to integrating Carbon Black to almost all of the products. Hence this VMworld has been one of the best so far: kubernetes, containers, integrated security, even more networking, infrastructure as a code, terraform, ansible, machine learning... almost forgot, vSphere on ARM. A lot of words and talks not in the traditional virtualization admin's vocabulary. Things are changing and doing it fast. Looking at how VMware redefined itself, caught up with missed trains and how it will lead some of those trains, I think we are going through some of most radical shifts in IT and I personally feel like I've missed at least one train.

I am an old style IT guy: started installing servers in racks and learning by heart RJ45 cross and direct cabling schemes. When cloud came along, it was not very difficult to apply what I had learnt in datacenters. But I stayed somehow focused on the private and sometimes hybrid cloud. Never went fully cloud native, never gotten knee deep into deploying and operating applications in the public cloud. Naturally, DevOps went from hype to mainstream. And as time passed by, machine learning came along and AI started doing crazy stuff like beating humans at League of Legends (I did play a lot of DotA back in the days and not a single game of Go :-) ) A lot happened in the last 10 years.

I've been going to VMworlds since 2013 and kind of grew up with it. Still, this year I felt like I've been taken a bit by surprise. VMworld is, as always, a place to meet old and new friends and a place to learn about new trends. Barcelona adds to that whole vibe of the event. It was a blast to see people I've worked with on different projects years and years ago or to meet my fellow VMUG leaders.

As for the technical part:
- vSphere on ARM looks promising - comparing to last year there were several devices installed with ESXi from tablets, to network cards to high performance ARM 2U rackable servers
- machine learning and Watson applied to log monitoring and analysis
- vRA 8 redesign - container based, no Windows
- NSX-T everywhere - in your DC, in the cloud and with a single pane of glass
- can't wait for the next vSphere release

Solution exchange walk was also interesting:
- if AWS presence was expected, having Azure and Oracle (although the latter was a bit hidden) was cool
- most of the traditional networking/security vendors were not present anymore
- LG and Samsung had pretty big stands
- a lot of backup vendors old and new
- ComputerWeekly was also there with a stand
- Penetration Testing as a Service - a platform from where you could hire pen testers to do some... pen testing
- VMware's booth is getting bigger and bigger

My take away from last week is I have to do a lot of reading and learning, way more than the usual, but this is what makes IT interesting.

Sunday, September 29, 2019

Troubleshoot WordPress Connectivity Error to Database

I am an irregular user of Linux. For some demo scenario, I needed to install WordPress. All good and pretty straight forward. I've deployed two CentOS VMs, installed MySQL on one (DB layer) and Apache and PHP on the the other (front end). I've configured MySQL, created DB, added users. I've also tested connectivity from front end to the DB layer. I've also installed WordPress, but when connecting to wp-admin site, I got a connection error message to the DB.

Weird, since I could actually connect to the DB using mysql client and the same credentials.

So it is not a connection error or a problem with the DB.

Since mysql client from front end works, I've tested php connectivity. For this I've put together the following small script in a file called conn.php and put the file in /var/www/html/wordpress/

<?php
try {
    $dbh = new PDO('mysql:host=IP_ADDR;port:3306;dbname=WP_DB_NAME','USER', 'PASS');
    print "Connected succefully";
}
catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";
    die();
}
?>

Next I opened the browser and executed the php script which resulted in an expected connection error:

At this point I just run the same script, but this time interactively from php command line:

And I got a less expected connection successful message. By default SELinux is enabled and it will not allow httpd to talk to the mysql service. A quick test is to temporarily disable SELinux:

setenforce 0

and running again the script will connect successfully:

So the main culprit is SELinux. Going forward there are 2 options: disable SELinux permanently (which is acceptable in demo labs) or enable Apache to connect to other service by changing the SELinux policy (which will be another post)

Thursday, September 19, 2019

My First AWS Exam

Recently I took AWS Solution Architect Associate exam. It was one of my oldest dreams and somehow a personal challenge. I have been involved at some point in my career with integrating vRealize Automation and AWS. So I got a bit of experience working with EC2 instances, VPCs, ELBs. But then I changed project and position and I did not touch AWS for a while. Recently I've restarted to work with AWS but this time more with S3, IAM and Lambda functions. So I've decided to give it a try and get certified.

I've prepared for the exam taking acloud.guru training course. The course is well structured and educative. I have enjoyed the hands on labs (especially the Alexa skill building one). Practice tests will come in handy, but the course feels a bit insufficient on these. So if you choose to go through more practice tests, whizlabs can be one option. It's not a brain dump and the questions will help you get more knowledge about AWS services as well as get used to the scenario based type of questions. Another step is read the FAQ for the services in the exam blueprint. And lastly you need to have hands-on experience - so get your hands dirty.

The exam was tougher than I expected. At some point I was thinking that I will not pass (yes, I did pass). Previous IT experience helps in choosing the right answer when you actually don't know it. You also need to read very carefully the question, which being a scenario based question can be harder to understand for English speakers like me. That's why I prefer to read it out loud (it helps being alone in the exam room).

All in all, it was a good experience and I am interested in trying my forces with another AWS exam.

Tuesday, August 27, 2019

Create vCenter Server Roles Using PowerCLI - Applied to Veeam Backup & Replication

Security is important and having a minimal set of permissions is a requirement, not an option. Having this in mind (and being asked a few times by customers), I put together a short script that will create the vCenter Server roles required by Veeam Backup & Replication service account and Veeam ONE service account. The two accounts have different requirements, with Veeam ONE being the most restrictive as it needs mostly read only.

The script itself is pretty straight forward, the more time consuming is getting the privilege lists. So here you are:

And now for the actual scripting part:

$role = "Veeam Backup Server role"
$rolePrivilegesFile = "veeam_vc_privileges.txt"
$vCenterServer = "your-vcenter-server-FQDN"
Connect-VIServer -server $vCenterServer
$roleIds = @()
Get-Content $rolePrivilegesFile | Foreach-Object{
    $roleIds += $_
}
New-VIRole -name $role -Privilege (Get-VIPrivilege -Server $vCenterServer -id $roleIds) -Server $vCenterServer

The script will create a new vCenter Server role assigning it privileges from the file given as input.

If you ever require to get the privileges from vCenter Server then the next piece of code will help (thanks to VMware communities)

$role = "VBR Role"
Get-VIPrivilege -Role $role | Select @{N="Privilege Name";E={$_.Name}},@{N="Privilege ID";E={$_.ID}}

You will use the privilege ID format for creating the new role.

Saturday, July 27, 2019

Veeam Replication - Automatically Fix Detected Invalid Snapshot Configuration

This one comes directly from the field. It is about that moment when something goes wrong at a client's site and you need to fix. In this case, replicas went wrong. The link between the two sites not being reliable at all and some other networking issues at the DR site (yes, it's always network fault) determined replica VMs to end up in a corrupted state. If you want to keep the replica VMs that are already in the DR site then the fix is pretty simple: remove snapshots and remap vmdk's still pointing to delta files. Manually doing this is cumbersome, especially since some VMs can have multiple disks and when we are talking about tens or hundreds of VMs. Luckily, some things can be scripted.

We will present some of the modules in the script, since the whole script is published here on GitHub. For ease of readability we will remove some of the try-catch blocks and logging messages from the original script and comment here only the logical part.

Script is given as input parameters vCenter Server where VM replicas are, backup server hostname, replica job status, replica job fail message and replica suffix. The replica suffix is important since it uses it to find VMs

$vbrServer = "vbr1"
$vcServer = "vc1"
$status = "Failed"
$reason = "Detected an invalid snapshot configuration."
$replicaSuffix = "_replica"

The script needs to be executed from a place where both vCenter Server and Veeam backup server are reachable and where PowerCLI module is imported as well as Veeam PowerShell snapin.

Add-PSSnapIn -Name VeeamPSSnapin
Connect-VBRServer -Server $vbrServer
Connect-VIServer -Server $vcServer

Next, the script will get all VMs for which the replication job has failed ($status) with the given reason ($status). For this, we use Get-VBRJob cmdlet, FindLastSession() and GetTaskSessions() methods. Once the VM in a replica job matches the chosen criteria, it i added to an array ($vmList)

$vmList = @()
# get failed replica VM names 
$jobs = Get-VBRJob  | Where {$_.JobType -eq "Replica"}
foreach($job in $jobs)
{
 $session = $job.FindLastSession()
 if(!$session){continue;}
 $tasks = $session.GetTaskSessions() 
 $tasks | foreach { 
        if (($_.Status -eq $status) -and ($_.Info.Reason -match $reason)) {
            $vmList += $_.Name
            }
        }
}

Once we have the list of VMs who's replica failed, it's time to get dirty. The VM name we are looking for is made of the original VM name (from the array) and the replica suffix.

$replicaName = $_ + $replicaSuffix

First, delete the snapshots. Fot this we use PowerCLI cmdlets: Get-VM, Get-Snapshots, Remove-Snapshot.

$replica = Get-VM -Name $replicaName -ea Stop
$replica | Get-Snapshot -ea Stop | Sort-Object -Property Created | Select -First 1 | Remove-Snapshot -RemoveChildren -Confirm:$false -ea Stop

Next, remap the disks if they are still pointing to the delta file. In order to do that, we get all the disks for the replica VM (Get-HardDisk) and we check if the disk name of the replica VM contains the specific delta file characters ("-0000"). This is how we determine if it's a delta disk or a normal disk. Delta disk name is parsed to generate the source disk name (removing the delta characters from vmdk name). Once this is done, it's just a matter of reattaching the source disk to the VM (Remove-HardDisk, New-HardDisk)

# get disks for a replica VM
$disk = $replica |  Get-HardDisk -ea Stop
# process each disk 
$disk | foreach {
    $diskPath = $_.Filename
    # check if disk is delta file
    if ($diskPath -Match "-0000") {
        $diskPath = $_.Filename
        # for each delta file parse the original vmdk name
        $sourceDisk = $diskPath.substring(0,$diskPath.length-12) + ".vmdk"
        # get the datastore where the delta is 
        $datastore = Get-Datastore -Id $_.ExtensionData.Backing.Datastore
        # check the original vmdk still exists on that datastore
        if (Get-HardDisk -Datastore $datastore.Name -DatastorePath $sourceDisk) {
            # remove delta
            Remove-HardDisk -HardDisk $_ -Confirm:$false -ea stop
            # attach original disk
            $newDisk = New-HardDisk -VM $replica -DiskPath $sourceDisk -ea Stop
        } else {
            Write-Host "WARN Could not find $($sourceDisk) on $($datastore.Name) "
            Add-Content -Path  $logFile -Value "WARN: Could not find $($sourceDisk) on $($datastore.Name) "
        }
    }
}

Last thing to do is to consolidate disks. We run Get-VM and check if consolidation is needed. If it's needed, we just run ConsolidateVMDisks_Task() method.

$vm = Get-VM -Name $replicaName -ea Stop
if ($vm.Extensiondata.Runtime.ConsolidationNeeded) {
    $vm.ExtensionData.ConsolidateVMDisks_Task()
}

Now the replica VMs are re-usable. There is some manual job to be done, though. You need to map the replica VM in the replication job and run the job.

Sunday, July 21, 2019

A Few Thoughts on VMUG Romania Meeting

I am one of the VMware User Group (VMUG) leaders in Romania (co-leader). Basically I do event organization: getting speakers, sponsors, location, catering, promotion on social media. Sometimes I present at the meetings, but luckily enough for the past year our small and energetic community had other presenters from community willing to spend their time and share their knowledge. Luckily for me, since I got more time to organize the event.

Last VMUG (July 18th) was a special one for me (hopefully for the community also). We got to organize our biggest event to day and have maybe the best speaker lineup so far - both from sponsors and from community. We had the chance to meet Joe Baguley, VP and CTO, EMEA at VMware, and have him as key note speaker. A nice surprise was to meet and listen to Aylin Sali, CTO at Runecast, and one of the co-founders (Romanian one). As for the other guys, since we met before there were no surprises (at least not during business hours :-) ) just the same inspiring technical leaders that I knew. It was a fun and thrilling event with lots of ideas, networking and knowledge sharing. For this I thank you all: speakers, sponsors, co-leaders. A special thank you goes to the 4 community members that flew in from Timisoara and Cluj (Cluj guys for the second time already) to be with us on that day.

What's next? The bar has been risen a bit (blame the organizers), but this is a great opportunity to do better. Autumn will come with a pre-VMworld event and more great speakers.We also have been planning for a while to do an event outside Bucharest. Lastly, my dream is to organize the Balkanic UserCon. As you can see there is a lot of work to do ahead and a lot of place to grow.

Until next VMUG, I will leave you with the pictures (thanks Titi).

Wednesday, June 19, 2019

NSX- T Part 2 - Going Through Changes

The pop reference in the title was in my mind coming from metal ballad world, but as I found out it can easily be hip-hop. However this is where any reference to musical world stops as the following article will focus on some of the major changes that are brought in by NSX-T.

NSX-T is definitely a game changer. Latest release (version 2.4) brings in a lot of new functionality getting to parity with its vSphere counterpart. Coming from an NSX-v background, I am interested in the main differentiators that NSX-T is bringing into the game.

1. Cross platform support and independence from vCenter Server.

The two come together since supporting multiple platforms required decoupling NSX Manager from vCenter Server. Besides supporting vSphere, NSX-T can also support non-ESX hypervisors such as KVM. I expect to see more coming in the near future. Lastly, NSX-T supports bare metal workloads and native public clouds (see integrating AWS EC2 instances - youtube video here )

I will add here the NSX-T container plugin (NCP) that provides integration with container orchestration platforms (Kubernetes) and container based platform as a service (OpenShift, Pivotal Cloud Foundry).

That's a big step from vSphere environments.

2. Encapsulation protocol

At the core of any network virtualization technology is an encapsulation protocol that allows to send L2 protocols over L3. NSX-v uses VXLAN. Geneve is the protocol used by NSX-T. At the time of writing, Generic Network Virtualization Encapsulation (Geneve) is still an IETF draft, although on the standards track. According to people much wiser than me and that were involved in the specification design for Geneve, what the new protocol brings on the table is bringing the best from other protocols such as VXLAN, NVGRE and STT.

One of the main advantages of Geneve is that is uses a variable length header which allows to include as many options as necessary without being limited to the 24 bit header of VXLAN and NVGRE or using all the time a 64 bit STT style header.

3. N-VDS

NSX-T virtual distributed switches are called N-VDS and they are independent of vCenter Server. For this reason they come in 2 flavors (actually 3) depending on the platform:
- ESXi - NSX's version of vswitch which is implemented as an opaque switch in vSphere
- KVM - VMware's version of OpenvSwitch
- cloud and bare metal - NSX agent

An opaque switch is a network created and managed by a separate entity outside of vSphere. In our case logical networks that are created and managed by NSX-T. They appear in vCenter Server as opaque networks and can be used as backing for VMs. Although not a new thing , they are different from the NSX-v vswitches. This means that installing NSX-T in a vSphere only environment will still bring in the opaque networks instead of the NSX-v logical switches.

4. Routing

Differences are introduced at routing level where a two tiered model is being introduced by NSX-T. A very interesting blog article is here and I briefly will use a picture from it:

In a very short explanation:

Tier-0 logical router provides a gateway service to the physical world
Tier-1 logical router provides services for tenants (multi-tenancy) and cloud management platforms

Not both tiers are needed - depending on the services needed, only one tier can be implemented.

In a way, NSX-v could provide the same tiered model using Edge Services Gateway (ESG's), but in certain designs the routing paths were not optimal. NSX-T delivers optimized routing, simplicity and more architectural flexibility. Also, with the new model, Distributed Logical Router (DLR) control VM has been removed.

I will let you decide which of the changes briefly presented above are more important and I will just list them below:

standalone (vCenter independent)
multi hypervisor support
container integration
GENEVE
NVDS and OpenvSwitches
multi tiered optimized routing

However, if you are moving around the network virtualization world and haven't picked up on NSX-T, maybe it's time to start.

Tuesday, June 11, 2019

Docker Containers and Backup - A Practical Example Using vSphere Storage for Docker

A few months ago I started looking into containers trying to understand both the technology and how it will actually relate to one of the questions I started hearing recently "Can you backup containers?".

I do not want to discourage anyone reading the post (as it is interesting), but going further basic understanding of containers and Docker technology is required. This post will not explain what containers are. It focuses on one aspect of the Docker containers - persistent storage. Contrary to popular believe, containers can have and may need persistent storage. Docker volumes and volume plugins are the technologies for it.

Docker volumes are used to persist data to the container's writable layer. In this case the file system of the docker host. Volume plugins extend the capabilities of Docker volumes across different hosts and across different environments: for example instead of writing container data to the host's filesystem, the container will write data to an AWS EBS volume, or Azure Blob storage or a vSphere VMFS.

Let's take a step down from abstract world. We have a dockerized application: a voting application. It uses a PostgreSQL database to keep the results of the votes. The PostgreSQL DB needs a place to keep its data. We want that place to be outside the Docker host storage and since we are running in a vSphere environment, we'll use vSphere Storage for Docker. Putting it all in a picture would look like this (for simplicity, only PostgreSQL container is represented):

We'll start with the Docker host (the VM running on top of vSphere). Docker Engine is installed on the VM and it runs containers and creates volumes. The DB runs in the container and needs some storage. Let's take a 2 step approach:

First, create the volume. Docker Engine using vSphere Storage for Docker plugin (vDVS Plugin and vDVS vib) creates a virtual disks (vmdk's) on the ESXi host's datastore and maps it back to the Docker volume. Now we have a permanent storage space that we can use.

Second step: the same Docker engine presents the volume to the container and mounts it as a file system mount point in the container.

This makes it possible for the DB running inside the container to write in the vmdk from the vSphere datastore (of course, without knowing it does so). Pretty cool.

The vmdk is automatically attached to the Docker host (the VM). More, when the vmdk is created from Docker command line, it can be given attributes that apply to any vmdk. This means it can be created as:

independent persistent or dependent (very important since this affects the ability to snapshot the vmdk or not)
thick (eager or lazy zeroed) or thin
read only

It can also be assigned a VSAN policy. The vmdk will persist data for the container and across container lifetime. The container can be destroyed, but the vmdk will keep existing on the datastore.

Let's recap: we are using a Docker volume plugin to present vSphere datastore storage space to an application running within a Docker container. Or shorter, the PostgreSQL DB running within the container writes data to vmdk.

Going back to the question - can we backup the container? Since the container itself is actually the runtime instance of Docker image (a template), it does not contain any persistent data. The only data that we need is actually written in vmdk. In this case, the answer is yes. We can back it up in the same way we can backup any vSphere VM. We will actually backup the vmdk attached to the docker host itself.

Probably the hardest question to answer when talking about containers is what data to protect as the container itself is just a runtime instance. By design, containers are ephemeral and immutable. The writable space of the container can be either directly in memory (tmpfs) or on a docker volume. If we need data to persist across container lifecycles, we need to use volumes. The volumes can be implemented by a multitude of storage technologies and this complicates the backup process. Container images represent the template from which containers are launched. They are also persistent data that could be source for backup.

Steps for installing the vSphere Docker Volume Service and testing volumes

prerequisites:

Docker host already exists and it has access to Docker Hub
download the VIB file (got mine from here)

logon to ESXi host, transfer and install the vib

esxcli software vib install -v /tmp/VMWare_bootbank_esx-vmdkops-service_0.21.2.8b7dc30-0.0.1.vib

restart hostd and check the module has been loaded

/etc/init.d/hostd restart
ps -c | grep vdmk

logon to the Dokcer host and install the plugin

sudo  docker plugin install --grant-all-permissions --alias vsphere vmware/vsphere-storage-for-docker:latest

create a volume and inspect it - by default it will create the vmdk as independent persistent which will not allow snapshots to be taken - add option (-o attach-as=persistent) for dependent vmdks

sudo docker volume create --driver=vsphere --name=dockerVol -o size=1gb
sudo docker volume inspect dockerVol
sudo docker volume create --driver=vsphere --name=dockerVolPersistent -o size=1gb -o attach-as=persistent

go to vSphere client to the datastore where the Docker host is configured and check for a new folder dockvols and for the VMDK of the volume created earlier

since the volumes are not used by any container, they are not attached to the Docker host VM. Create a container and attach it the dependent volume

sudo docker container run --rm -d --name devtest --mount source=dockerVolPersistent,target=/vmdk alpine sleep 1d

Lastly, create a backup job with the Docker host as source, exclude other disks and run it.

Thursday, March 7, 2019

vCenter Server Restore with Veeam Backup & Replication

Recently I went through the process of testing vCenter Server appliance restore in the most unfortunate case when the actual vCenter Server was not there. Since the tests were being done for a prod appliance, it was decided to restore it without connectivity to the network. Let's see how this went on.

Test scenario

distributed switches only
VCSA
Simple restore test: put VCSA back in production using a standalone host connected to VBR

Since vCenter is "gone", first thing to do is to directly attach a standalone ESXi host to VBR. The host will be used for restores (this is a good argument for network team's "why do you need connectivity to ESXi, you have vCenter Server"). The process is simple, open VBR console go to Backup Infrastructure and add ESXi host.

You will need to type in the hostname or IP and root account. Since vCenter Server was not actually gone, we had to use the IP instead of FQDN as it was seen through the vCenter Server connection with the FQDN.

Next, start the an entire VM restore

During the restore wizard, select the point in time (by default last one), then select Restore to a different location or with different settings:

Make sure to select the standalone host:

Leave default Resource Pool and datastores, but check the selected datastore has sufficient space. Leave the default folder, however if you still have the source VM change the restored VM's name:

Select the network to connect to. Actually disconnect the network of the restored VM. That was the scenario, right? Since the purpose of this article is not to make you go through the same experience we had, let's not disconnect it. And you will see why immediately:

Keep defaults for the next screens and start the restore (without automatically powering on the VM after restore).

A few minutes later the VM is restored and connected to the distributed port group.

We started by testing a disconnected restored VM, but during the article we didn't disconnect it. And here is why: when we initially disconnected the network of the restored VM, we got an error right after the VM was registered with the host and the restore failed.

Same error was received trying to connect to a distributed portgroup configured with ephemeral binding. The logs show the restore process actually tries to modify network configuration of an existing VM and that makes it fail when VBR is connected directly to the console.When the portgroup is not changed for the restored VM, then the restore process skips updating network configuration. Of course, updating works with standard switch port group.

In short, the following restore scenarios will work when restoring directly through a standalone host:

restore VCSA to the same distributed port group to which the source VM is connected
restore VCSA to a standard portgroup

Tuesday, February 19, 2019

Running Veeam PowerShell Scripts in Non-Interactive Mode - Credentials

This time we get back to some PowerShell basics and how to run scripts in non-interactive mode.

Veeam Backup & Replication and scripting go hand in hand really well. And the first thing you do when running a script is to connect to backup server

Connect-VBRServer -Server $ServerName -User $User -Password $Pass

Placing user and pass in clear text in a script may not be the best option. We could use PSCredential object to hold the username and password.

$PSCredential = Get-Credential
Connect-VBRServer -Server $ServerName -Credential $PSCredential

However getting the credentials in the objects implies using Get-Credential cmdlet which is interactive and will prompt to type the user and password. This makes it hard to run the script in non-interactive mode.

To make it non-interactive we need password saved somewhere. To save and encrypted password we can use ConvertFrom-SecureString cmdlet and pipe the output to a file:

(Get-Credential).Password | ConvertFrom-SecureString | Set-Content $encPassFileName

Since the username could be stored in the script itself, we retrieve only the secure string for the password, pipe it to the encrypting cmdlet and then output it to a file. Opening the output file will list something similar to:

000d08c9ddf0115d1118c7a00c04fc297eb01000

Now we need to create the PScredential object:

$password = Get-Content $encPassFileName | ConvertTo-SecureString 
$psCredential = New-Object System.Management.Automation.PsCredential($username,$password)

First, we loaded the content of the file storing the encrypted password and convert it to a secure string (PScredential objects use secure strings). Next we created the object with the username and the password. The generated object can be used to connect to VBR Server

Connect-VBRServer -Server $ServerName -Credential $PSCredential

By default, ConvertFrom-SecureString encryption is done using Windows Data Protection API (DPAPI). For this reason the file cannot be moved to another computer and used from that one. On each computer from where the script is being run, the password must be encrypted separately.

Wednesday, January 30, 2019

NSX-T Part 1 - Architectural Overview

This is is the first in a series of blogs about NSX-T installation and configuration. Although similar in concepts, NSX-T bring a series of differences from NSX-V. One of them is that it is independent from vCenter Server. And this brings also a lot of changes from what I was used to - some changes are architectural, others are related to functionality and user interface.

The blog series will start with architectural overview, then go to deployment of NSX-T Manager and configuration of controllers, followed by basic configuration of NSX-T environment. Once we get to use NSX-T overlay network between two VMs, we will continue with exploring routers and edges.

At a high level, the architecture is made of the same 3 components as in NSX-V:

management plane - single API entry point, user configuration, handles user queries, and performs operational tasks on all management, control, and data plane nodes in the system
control plane - computes all ephemeral runtime state based on configuration from the management plane, disseminates topology information reported by the data plane elements, and pushes stateless configuration to forwarding engines
data plane - stateless forwarding/transformation of packets based on tables populated by the control plane and reports topology information to the control plane

Management Plane Agent (MP Agent) is a component that runs on controllers and transport nodes (hypervisors) and is responsible for executing tasks requested by Management Plane:

desired state of the system
message communication to management plane (configuration, statistics, status, real time data)

Control plane is made of Central Control Plane (CCP) running on the Controller cluster and the Local Control Plane (LCP) running on transport nodes. CCP provides configuration to other controllers and LCP and it is detached from data plane (failure in CCP will not impact data plane). LCP is responsible for most of the ephemeral runtime state computation and pushing stateless configuration to forwarding engine in data plane. LCP is linked to the Transport Node hosting it.

Data plane's forwarding engine performs stateless forwarding of packets based on tables populated by control plane. Data plane maintains state for features like TCP termination, but this state is about payload manipulation and is different from the state maintained at Control Plane's level which is about how to forward packets (see MAC:IP tables)

N-VDS (NSX Managed Virtual Distributed Switch or KVM Open vSwitch)

N-VDS provides traffic forwarding. It is a component presented as a hidden distributed switch (it will not appear in vCenter Server for example). It is always created and configured by NSX manager in vSphere environments. It can be predefined in KVM. This is a major change from NSX-V. It also implies that you will need to have at least one not utilized vmnic on the ESXi host for the N-VDS.

Another change is that the encapsulation protocol used now is GENEVE (latest IETF draft here)

Virtual Tunnel Endpoint (VTEP)

VTEP is the connection point at which the encapsulation and decapsulation takes place.

Transport Node

A node capable of participating in an NSX-T Data Center overlay or NSX-T Data Center VLAN networking.

Transport Zone

Collection of transport nodes that defines the maximum span for logical switches.

Uplink Profile

Defines settings for the links from hypervisor hosts to NSX-T Data Center logical switches or from NSX Edge nodes to top-of-rack switches (active/standby links, VLAN ID, MTU).

Logical switches - provide Layer 2 connectivity across multiple hosts

Logical router - provides connectivity North-Sourh enabling access to external networks and East-West for networks within the same tenant. A logical router consists of two optional parts:

a distributed router (DR) - one-hop distributed routing between logical switches and/or logical routers connected to it
one or more service routers (SR) - delivers services that are not currently implemented in a distributed router

A logical router always has a DR, and it has SR if any of the following is true:

it is a Tier-0 router, even if stateful services are not configured
it is a Tier-1 router linked to a Tier-0 router and has stateful services configured (such as NAT)

NSX Edge - provides routing services and connectivity to external networks as well as non-distributed services.

In the next post will see how to install NSX-T manager and how all the above mentioned concepts are coming together when configuring the manager

Wednesday, January 23, 2019

Role Based Access for VMware in Veeam Backup & Replication 9.5 Update 4 - Advanced Backup Features and Recovery

In previous post we looked at how to configure the integration between Veeam and vCenter Server and how to create a simple backup job in self service portal.

In this post we'll go further and look at advanced configuration options available for backup jobs as well at recovery features.

1. VM restore
The user restore the VMs he is in charge of . Login to self service portal (https://em-address:9443/backup) and go to VMs tab. Select the VM you want to restore and press Restore. Two choices are given - overwrite and keep original VM.

Once the restore point has been selected and whether to power on or not the VM, the wizard starts the restore process to production environment. Choosing "keep" will result in a new VM being created with the name: originalVMname_restoredDateTime

In this case pay attention not to power it on after restore as it is connected to the same network with the original VM. If you choose "overwrite", powering on maybe a good idea.

2. Guest file system indexing and file restore

When enabled, VBR will create an index of files and folders on the VM fuest OS during backup. Guest file indexing allows you to search for VM guest OS files inside VM backups and perform 1-click restore in the self service portal.Initially the indexes are created on VBR server, but later on moved to Enterprise Manager server.

Let's take our job created initially and add guest indexing to it. Logon the self service portal, go to jobsa and edit the job and on guest processing tab tick the boc "Enable guest file system indexing".

Add user credentials and if different credentials are needed for VMs in the job press customize credentials. Lastly, define what to index. By default everything is indexed except system folders (Program files, Temp, Windows). You can select to index only specific folders or to index all folders.

Save the configuration and run the backup job. Once the job runs successfully, in self service portal got to Files tab and type in the name of the file you want to find. In my case, the indexed VM is running IIS so I will search for iisstart.htm file.

After finding the file we can now restore the file directly to production VM overwriting the originla file or keeping it. The file can be also downloaded to the local computer in an archive having the name FLR_DATE_HOUR

If we have more files to restore, we can add the file to a Restore list and execute the restore on multiple files at a time:

File level restore is possible without indexing, but it needs to mount the backup and let you manually search the file in the file system.

3. Application item restore

If using Veeam explores allows for a very good granularity and flexibility regarding supported applications, the self service portal is restricted to MSSQL and Oracle applications. On the Items of the self service portal, select the database, the location where to restore (original or alternate) and the point in time.

We'll restore the DB on the same server, but with a different name. For this reason choose restore to another location and press Restore. Specify the server name and the credentials for connecting to admin share and MSSQL DB

Specify the MDF and log file names - change them is the same location with original DB is being used:

Press Finish and watch the progress in the status bar:

When it is finished, you can connect to MSSQL server and check the DB has been created:

The new integration with vCenter Server offers a powerful tool for job management delegation to the users others than backup administrators.