Saturday, July 27, 2019

Veeam Replication - Automatically Fix Detected Invalid Snapshot Configuration

This one comes directly from the field. It is about  that moment when something goes wrong at a client's site and you need to fix. In this case, replicas went wrong. The link between the two sites not being reliable at all and some other networking issues at the DR site (yes, it's always network fault) determined replica VMs to end up in a corrupted state. If you want to keep the replica VMs that are already in the DR site then the fix is pretty simple: remove snapshots and remap vmdk's still pointing to delta files. Manually doing this is cumbersome, especially since some VMs can have multiple disks and when we are talking about tens or hundreds of  VMs. Luckily, some things can be scripted.

We will present some of the modules in the script, since the whole script is published here on GitHub. For ease of readability we will remove some of the try-catch blocks and logging messages from the original script and comment here only the logical part.

Script is given as input parameters vCenter Server where VM replicas are, backup server hostname, replica job status, replica job fail message and replica suffix. The replica suffix is important since it uses it to find VMs

$vbrServer = "vbr1"
$vcServer = "vc1"
$status = "Failed"
$reason = "Detected an invalid snapshot configuration."
$replicaSuffix = "_replica"

The script needs to be executed from a place where both vCenter Server and Veeam backup server are reachable and where PowerCLI module is imported as well as Veeam PowerShell snapin.

Add-PSSnapIn -Name VeeamPSSnapin
Connect-VBRServer -Server $vbrServer
Connect-VIServer -Server $vcServer

Next, the script will get all VMs for which the replication job has failed ($status) with the given reason ($status). For this, we use Get-VBRJob cmdlet, FindLastSession() and GetTaskSessions() methods. Once the VM in a replica job matches the chosen criteria, it i added to an array ($vmList)

$vmList = @()
# get failed replica VM names 
$jobs = Get-VBRJob  | Where {$_.JobType -eq "Replica"}
foreach($job in $jobs)
{
 $session = $job.FindLastSession()
 if(!$session){continue;}
 $tasks = $session.GetTaskSessions() 
 $tasks | foreach { 
        if (($_.Status -eq $status) -and ($_.Info.Reason -match $reason)) {
            $vmList += $_.Name
            }
        }
}

Once we have the list of VMs who's replica failed, it's time to get dirty. The VM name we are looking for is made of the original VM name (from the array) and the replica suffix.

$replicaName = $_ + $replicaSuffix

First, delete the snapshots. Fot this we use PowerCLI cmdlets: Get-VM, Get-Snapshots, Remove-Snapshot.


$replica = Get-VM -Name $replicaName -ea Stop
$replica | Get-Snapshot -ea Stop | Sort-Object -Property Created | Select -First 1 | Remove-Snapshot -RemoveChildren -Confirm:$false -ea Stop

Next, remap the disks if they are still pointing to the delta file. In order to do that, we get all the disks for the replica VM (Get-HardDisk) and we check if the disk name of the replica VM contains the specific delta file characters ("-0000"). This is how we determine if it's a delta disk or a normal disk. Delta disk name is parsed to generate the source disk name (removing the delta characters from vmdk name). Once this is done, it's just a matter of reattaching the source disk to the VM (Remove-HardDisk, New-HardDisk)

# get disks for a replica VM
$disk = $replica |  Get-HardDisk -ea Stop
# process each disk 
$disk | foreach {
    $diskPath = $_.Filename
    # check if disk is delta file
    if ($diskPath -Match "-0000") {
        $diskPath = $_.Filename
        # for each delta file parse the original vmdk name
        $sourceDisk = $diskPath.substring(0,$diskPath.length-12) + ".vmdk"
        # get the datastore where the delta is 
        $datastore = Get-Datastore -Id $_.ExtensionData.Backing.Datastore
        # check the original vmdk still exists on that datastore
        if (Get-HardDisk -Datastore $datastore.Name -DatastorePath $sourceDisk) {
            # remove delta
            Remove-HardDisk -HardDisk $_ -Confirm:$false -ea stop
            # attach original disk
            $newDisk = New-HardDisk -VM $replica -DiskPath $sourceDisk -ea Stop
        } else {
            Write-Host "WARN Could not find $($sourceDisk) on $($datastore.Name) "
            Add-Content -Path  $logFile -Value "WARN: Could not find $($sourceDisk) on $($datastore.Name) "
        }
    }
}

Last thing to do is to consolidate disks. We run Get-VM and check if consolidation is needed. If it's needed, we just run ConsolidateVMDisks_Task() method.


$vm = Get-VM -Name $replicaName -ea Stop
if ($vm.Extensiondata.Runtime.ConsolidationNeeded) {
    $vm.ExtensionData.ConsolidateVMDisks_Task()
}

Now the replica VMs are re-usable. There is some manual job to be done, though. You need to map the replica VM in the replication job and run the job.

1 comment:

rozana said...

Goood reading this post