Lately one of my customers had some issues with his VMware vSphere environment where the semi-outage of one ESXi took down quit a huge amount of his VMs. After fixing the issue in his productive infrastructure and bringing back his IT everything seamed working again. Some days later we noticed Errors in some of his Veeam Backup Jobs. Strangely not all VMs where affected and those affected had no direct correlation (e.g. same ESXi host, same datastore, some network, etc.).
The error message shown inside the Veeam Backup & Replication console was an error I have not seen before:
Error: DiskLib error: .A file error was encountered -- Failed to read the file
Error Failed to retrieve next FILE_PUT message. File path: [[<DATASTORE>] <VMFOLDER>/<VM>.vmx]. File pointer: . File size: .
Some time ago I had one of those rare days where I struggled to restore a VM using Veeam Backup & Replication. Some would claim that this has nothing to do with Veeam Backup & Replication, but is solely due to the choice of hypervisor. At this point, I would neither agree nor disagree with this claim, but there is definitely a reason why VMware is ahead in the field of server virtualization.
Actually, the whole thing should have been a fairly simple undertaking: Restore a complete VM to a specific point in time. At first everything looked fine, but after the virtual disks were restored and the VM should have started, the fun began.
Every year, before Christmas, the update window is coming up for some of my customers. One of these customers was due for minor VMware vSphere updates today. Actually not a big deal even if the customer has not activated vSphere Distributed Resource Scheduler (DRS) on his cluster due to missing licenses. As in previous years, the task was to manually evacuate the individual ESXi hosts one by one and then standardize them via the vSphere Update Manager (VUM). At the beginning everything was running without any issues until I wanted to evacuate the vCenter Server Appliance (VCSA) as the last VM of the host. For whatever reason, the migrate function was grayed out in the context menu of the VM.
So far, vMotion has actually never caused any problems in this cluster. So it was once again time for a little round of troubleshooting.
Today’s Homelab session dealt with the creation of a short customer demo of the Veeam Backup & Replication functionality SureBackup. As I have already implemented several SureBackup jobs for other customers, I was confident that I could quickly finish configuring the environment. For those who have not worked with SureBackup before, Veeam provides an excellent guide in their Help Center. You can find this guide here. Unfortunately the whole thing did not work out as expected. Already at the beginning I made a crucial mistake which made the creation of the demo a nerve-wracking adventure. More on this in a moment. First of all, for those of you who have no idea how the creation of a SureBackup job works, I would like to give a short outline.
To get some more flexibility in my Homelab I added another domain controller (Active Directory, DNS and DHCP). Unlike my first domain controller, which runs directly on the physical ESXi host (details can be found here), I installed the second domain controller inside the nested vSAN cluster. After configuring all services I wanted to use the new domain controller as an additional DNS server in my VMware vSphere environment. So I quickly adjusted the network and NTP settings of the vCenter Server appliance and the ESXi hosts and then everything should be fine. So far so good. No problem until then. Shortly after I added the additional domain controller in all locations a warning message appeared in my vSphere cluster.
After my „little“ homelab outage last year and the delivery of a new SSD I found some time to redeploy the nested cluster quite some time ago. During the preparation to my VCAP-DCV Deploy exam I deployed a second VCSA (vCenter Server Appliance) on my old Intel NUC and joined them to a single SSO domain to learn and try different things in the linked mode setup. That’s the reason why I received the „Could not connect to one or more vCenter Server Systems: https://<vcsaFQDN>:443/sdk“ every time I logged in to the second VCSA. Because I planned to redeploy the nested environment using the same IPs/FQDNs I wanted to make sure the orphaned VCSA is cleanly removed from the SSO configuration. This week one of my customers asked me for help with the same problem.A quick search and I found the following VMware KB article (again): Using the cmsso command to unregister vCenter Server from Single Sign-On (2106736). This time I decided to write a short blog post on the topic.
Yesterday I had a scheduled update of his Veeam Backup & Replication installation with one of my customers. We planned to go from version 9.5 Update 4b (22.214.171.12466) to version 10 GA (10.0.0.4461).
As usual, I created an encrypted configuration backup before the update for safety reasons. How this works and why you should encrypt the configuration backup you can read here and here. I prefer to be a little more cautious at this point, before I have the trouble in hindsight. However, I did not need the configuration backup. The update went smoothly and without problems.
Since I carried out the update during the day, it was not possible, in agreement with the customer, to perform a complete backup run directly after the update. Therefore, I did a short functional test using the Quick Backup capabilities of Veeam Backup & Replication. There were no problems here either.
Today the customer called and reported about failed backup jobs. So I looked into it:
In one of their latest updates for macOS Catalina Apple has introduced some new requirements for the acceptance of SSL certificates. The changes are documented here: https://support.apple.com/en-us/HT210176. This means that pages without a corresponding certificate are no longer accessible in Google Chrome. Unfortunately, the default vCenter Server certificate is one of the affected certificates. Unlike other certificate warnings, this error cannot be easily bypassed using Advanced options.
In the following I would like to show you how you can temporarily work around this issue.
As every year, some of my customers use the weeks after Christmas to update their environments. Nearly all of them run their ESXi hosts with vendor-specific Custom Images that provide additional drivers or agents over the standard VMware image. Unfortunately there are almost always problems with conflicting VIBs when they are updated. Of course it was the same this time. In my case, this time it was about a custom image from Fujitsu.
To perform the update successfully the problematic VIB must be removed. The necessary steps for this I would like to point out below.
Last friday I struggled with some of our Dell EMC Datadomain (DD) Systems. I tried to establish a MTree replication between two identical systems. The first few steps worked without an issue, so I was confident to get into the weekend early. But then things changed.
But first things first. Let’s start with the step by step process on how I enabled the replication.