VMware

vSphere 7 Update 2 NFS Array Snapshots Offload Support

The vSphere 7.0 U2 release provides the ability to use native snapshot when using the NFS protocol as the access mechanism. As described on the VMware blog:

NFS Improvements

NFS required a clone to be created first for a newly created VM and the subsequent ones could be offloaded to the array. With the release of vSphere 7.0 U2, we have enabled NFS array snapshots of full, non-cloned VMs to not use redo logs but instead use the snapshot technology of the NFS array in order to provide better snapshot performance. The improvement here will remove the requirement/limitation of creating a clone and enables the first snapshot also to be offloaded to the array.

What’s New in vSphere 7 Update 2 Core Storage

In this blog I explain the configuration needed to test this new feature. To start we should validate the prerequisites to be able to implement this solution. According to the VMware documentation portal the prerequisites are as follows:

  • Verify that the NAS array supports the fast file clone operation with the VAAI NAS program.
  • On your ESXi host, install vendor-specific NAS plug-in that supports the fast file cloning with VAAI.
  • Follow the recommendations of your NAS storage vendor to configure any required settings on both the NAS array and ESXi.

The NFS configuration will be done in NetApp Ontap using the “NetApp NFS Plug-in for VMware VAAI” plugin that recently added native NFS snapshot offload support.

Note: The plug-in can be downloaded from the NetApp support portal at the following link “NetApp Support”.

© 2021 NetApp

Once we are in the NetApp support portal we must download version 2.0 of the plugin as shown in the following image. To install the plugin we need to unzip the downloaded file and rename the file inside the folder named vib20 with the extension .vib to the new name NetAppNasPlugin.vib.

© 2021 NetApp

In the next step I used the NetApp Ontap Tools to install the plugin but it can also be installed using VMware Lifecycle Manager.

To install the plugin go to [ONTAP tools => Settings => NFS VAAI tools] and in the “Existing version:” section press “BROWSE” to select where the “NetAppNasPlugin.vib” file is stored. Once the file is located press “UPLOAD” to load and install the plugin.

In this step we can see how to install the plugin to the ESXi servers by pressing the “INSTALL” button.

The following image shows that the installation of the plugin was successful. An advantage of the new version of the plugin is that no reboot of the ESXi server is required.

After installing the plugin we will proceed to validate that the Ontap Storage has support for VMware vStorage APIs for Array Integration (VAAI) features in the NFS environment. This can be verified with the command <vserver nfs show -fields vstorage>. As you can see the vStorage function is currently disabled in the SVM called NFS. To enable the vStorage function use the <vserver nfs modify -vstorage enabled> command.

OnPrem-HQ::> vserver nfs show -fields vstorage 
vserver vstorage 
------- -------- 
NFS     disabled  

OnPrem-HQ::> vserver nfs modify -vstorage enabled -vserver NFS 

OnPrem-HQ::> vserver nfs show -fields vstorage                 
vserver vstorage 
------- -------- 
NFS     enabled  

OnPrem-HQ::> 

The next requirement to be able to use native snapshot offload is the creation of an advanced setting in the VM configuration called snapshot.alwaysAllowNative. To add this value you have to go to the VM properties then to [VM Options => Advanced => EDIT CONFIGURATION].

The following image shows the value of the <snapshot.alwaysAllowNative> variable that according to VMware documentation must have a value equal to “TRUE”. You can use the following link as reference “VMware Documentation”

Now i start testing to validate that the native snapshot is working in Ontap. First i will create a snapshot with the <snapshot.alwaysAllowNative> function set to FALSE. Then i will make changes to the VM so that i can measure the speed of deleting and applying the snapshot changes to the base disk. In the example shown below the command <New-Snapshot> in PowerCLI was used to create a snapshot of the VM named RocaWeb

PS /home/rebelinux> get-vm -Name RocaWeb | New-Snapshot -Name PRE_Native_Array_Snapshot | Format-Table -Wrap -AutoSize  
                                                                                                                                                                                                                                               Name                      Description PowerState                                                                                                                                                                                               ----                      ----------- ----------                                                                                                                                                                                               PRE_Native_Array_Snapshot             PoweredOff                                                                                                                                                                                                                                                                                                                                                                                                                                              
PS /home/rebelinux> 

In this step a 10GB file was copied to grow the snapshot so that i can measure how fast the changes are applied to the base disk when the snapshot is deleted. In this example the file “RocaWeb_2-000001-delta.vmdk” represents the delta where the snapshot changes are saved. This represents a traditional VMware snapshot.

[root@comp-01a:/vmfs/volumes/55ab62ec-2abeb31b/RocaWeb] ls -alh
total 35180596
drwxr-xr-x    2 root     root        4.0K May 31 23:40 .
drwxr-xr-x    7 root     root        4.0K May 31 19:02 ..
-rw-------    1 root     root      276.0K May 31 23:40 RocaWeb-Snapshot15.vmsn
-rw-------    1 root     root        4.0G May 31 23:40 RocaWeb-a03f2017.vswp
-rw-------    1 root     root      264.5K May 31 23:40 RocaWeb.nvram
-rw-------    1 root     root         394 May 31 23:40 RocaWeb.vmsd
-rwxr-xr-x    1 root     root        3.4K May 31 23:40 RocaWeb.vmx
-rw-------    1 root     root       10.0G May 31 23:51 RocaWeb_2-000001-delta.vmdk #Delta (VMFS Based Snapshot)
-rw-------    1 root     root         301 May 31 23:40 RocaWeb_2-000001.vmdk
-rw-------    1 root     root      500.0G May 31 23:37 RocaWeb_2-flat.vmdk
-rw-------    1 root     root         631 May 31 23:37 RocaWeb_2.vmdk
[root@comp-01a:/vmfs/volumes/55ab62ec-2abeb31b/RocaWeb]

The following image shows the time it took to apply the snapshot changes to the base disk when the snapshot was removed. In summary the operation took 9 minutes in total using traditional VMware snapshot.

Note: Ontap simulator was used for this lab.

In this last example the <New-Snapshot> command was also used to create the snapshot but with the <snapshot.alwaysAllowNative> option set to “TRUE”. In that way i can test the use of Native Snapshot Offload in NFS. Here again, a 10GB file was copied to the VM to grow the snapshot, so i can measure how quickly changes are applied to the base disk when the snapshot is deleted.

PS /home/rebelinux> get-vm -Name RocaWeb | New-Snapshot -Name POST_Native_Array_Snapshot | Format-Table -Wrap -AutoSize
                                                                                                                                                                                                                                               Name                       Description PowerState                                                                                                                                                                                              ----                       ----------- ----------                                                                                                                                                                                              POST_Native_Array_Snapshot             PoweredOff                                                                                                                                                                                                                                                                                                                                                                                                                                             
PS /home/rebelinux> 

Here we can see that there is no “-delta.vmdk” file but there is a file named “RocaWeb_2-000001-flat.vmdk” with the same size of 500GB as the “RocaWeb_2-flat.vmdk” file. This allows us to confirm that the NFS Native Snapshot Offload feature is enabled in Ontap.

[root@comp-01a:/vmfs/volumes/55ab62ec-2abeb31b/RocaWeb] ls -alh
total 49419672
drwxr-xr-x    2 root     root        4.0K Jun  1 00:07 .
drwxr-xr-x    7 root     root        4.0K May 31 19:02 ..
-rw-------    1 root     root      276.0K Jun  1 00:07 RocaWeb-Snapshot16.vmsn
-rw-------    1 root     root        4.0G Jun  1 00:07 RocaWeb-a03f2017.vswp
-rw-------    1 root     root      264.5K Jun  1 00:07 RocaWeb.nvram
-rw-------    1 root     root         393 Jun  1 00:07 RocaWeb.vmsd
-rwxr-xr-x    1 root     root        3.4K Jun  1 00:07 RocaWeb.vmx
-rw-------    1 root     root      500.0G Jun  1 00:09 RocaWeb_2-000001-flat.vmdk #No Delta (Array Based Snapshot OffLoad)
-rw-------    1 root     root         650 Jun  1 00:07 RocaWeb_2-000001.vmdk
-rw-------    1 root     root      500.0G Jun  1 00:03 RocaWeb_2-flat.vmdk
-rw-------    1 root     root         631 Jun  1 00:07 RocaWeb_2.vmdk
[root@comp-01a:/vmfs/volumes/55ab62ec-2abeb31b/RocaWeb] 

The following image shows the time it took to apply the snapshot changes to the base disk when the snapshot was removed using the NFS Native Snapshot Offload. In summary, you can see that applying the snapshot changes to the base disk took no time at all to finish.

Summary

NFS native snapshot offload operations are so fast because ONTAP references metadata when it creates a Snapshot copy, rather than copying data blocks, that why Snapshot copies are so efficient. Doing so eliminates the seek time that other systems incur in locating the blocks to copy, as well as the cost of making the copy itself.

HomeLab – How to enable VMware TPS for greater virtual machine consolidation

In this blog, I will be talking about how you can optimize RAM utilization in your “HomeLab”. The main objective of this tutorial is that you can achieve higher levels of consolidation while running your labs.

“Transparent Page Sharing” (TPS) is a method by which duplicate copies of memory pages are consolidated. In other words, the concept of TPS is somewhat similar to deduplication. This helps the ESXi server to free up repeated memory blocks of a virtual machine allowing for increased levels of consolidation.

Memory Deduplication

If you want to know a little more about TPS, its benefits and risks, you can access the following link “Transparent Page Sharing (TPS) in hardware MMU systems”. Although it is known that the use of TPS can be a security risk, it is my understanding that it may not pose a significant risk in a test environment such as ours. I provide a reference for this information:

In a nutshell, independent research indicates that TPS can be abused to gain unauthorized access to data under certain highly controlled conditions. In line with its “secure by default” security posture, VMware has opted to change the default behavior of TPS and provide customers with a configurable option for selectively and more securely enabling TPS in their environment. 

Disabling TPS in vSphere – Impact on Critical Applications

Note: I show you how to change this value using Powershell because it allows you to make the change to multiple servers at the same time.

To begin we must verify what value is currently configured on the ESXi servers. To accomplish this task i use the <Get-VMHost> command to extract the information of the servers connected to the vCenter. The result is then sent to the <Get-AdvancedSetting -Name Mem.ShareForceSalting> command which allows us to extract the value configured in the “Mem.ShareForceSalting” variable.

PS /home/blabla> Get-VMHost | Get-AdvancedSetting -Name Mem.ShareForceSalting | Select-Object Entity,Name,Value,Type | Format-Table -Wrap -AutoSize

Entity                          Name                  Value   Type
------                          ----                  -----   ----
esxsvr-00f.zenprsolutions.local Mem.ShareForceSalting     2 VMHost
comp-02a.zenprsolutions.local   Mem.ShareForceSalting     2 VMHost
comp-01a.zenprsolutions.local   Mem.ShareForceSalting     2 VMHost

PS /home/blabla> 

In this particular example the ESXi servers are configured with a default value of #2. Using the VMware documentation as a reference this value indicates that the “Inter-VM” TPS feature is disabled.

To activate the TPS “Inter-VM” function you can use the command <Set-AdvancedSetting> with the value of #0. It is worth to mention that this command can be activated with the VMs powered on or without the server being in maintenance.

PS /home/blabla> Get-VMHost -Name esxsvr-00f.zenprsolutions.local | Get-AdvancedSetting -Name Mem.ShareForceSalting | Set-AdvancedSetting -Value 0        

Perform operation?
Modifying advanced setting 'Mem.ShareForceSalting'.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): Y

Name                 Value                Type                 Description
----                 -----                ----                 -----------
Mem.ShareForceSalting 0                    VMHost               

PS /home/blabla> 

Once again you can validate with the <Get-AdvancedSetting> command if the configured value is the one i specified previously.

PS /home/blabla> Get-VMHost | Get-AdvancedSetting -Name Mem.ShareForceSalting | Select-Object Entity,Name,Value,Type | Format-Table -Wrap -AutoSize

Entity                          Name                  Value   Type
------                          ----                  -----   ----
esxsvr-00f.zenprsolutions.local Mem.ShareForceSalting     0 VMHost
comp-02a.zenprsolutions.local   Mem.ShareForceSalting     2 VMHost
comp-01a.zenprsolutions.local   Mem.ShareForceSalting     2 VMHost

PS /home/blabla> 

For this test i turned on 23 virtual machines (Windows) to bring the server in contention mode so that i can see what benefits the TPS has. This result I show you below represents the memory statistics of the ESXi server obtained with the command <esxtop>. Here you can see the statistics before i configured the “TPS Inter-VM” function with the value #2. The variable that is in bold “PSHARE/MB” represents the value of shared memory that the server currently has, i.e. only TPS is being used in “Intra-VM” mode. This variable has a value of 1400/MB.

2:58:11pm up 50 min, 722 worlds, 23 VMs, 45 vCPUs; MEM overcommit avg: 1.10, 1.10, 0.99
PMEM  /MB: 65398   total: 2139     vmk,60782 other, 2358 free
VMKMEM/MB: 65073 managed:  1265 minfree,  7501 rsvd,  57572 ursvd,  high state
PSHARE/MB:    1648  shared,     248  common:    1400 saving

Now i move on to validate the benefit of having the “TPS Inter-VM” feature enabled. As you can see in the following result of the <esxtop> command there was a substantial saving (32097/MB) of memory. This allowed us to increase the consolidation ratios of our “HomeLab”.

3:36:46pm up  1:29, 1024 worlds, 23 VMs, 45 vCPUs; MEM overcommit avg: 1.05, 1.05, 0.95
PMEM  /MB: 65398   total: 2262     vmk,60078 other, 3057 free
VMKMEM/MB: 65073 managed:  1265 minfree,  9092 rsvd,  55980 ursvd, clear state
PSHARE/MB:   33038  shared,     941  common:   32097 saving

Hasta Luego Amigos!

HomeLab – vSphere 7 update 2 QuickBoot with Nested ESXi

In this bog, I will be validating if the “ESXi QuickBoot” feature works in Nested mode i.e. running ESXi in a VM. First of all you must know exactly how QuickBoot works and all the requirements it has. To do this I will use the VMWare portal documentation as a reference:

Quick Boot is a vSphere feature that speeds up the upgrade process of an ESXi server.  A regular reboot involves a full power cycle that requires firmware and device initialization.  Quick Boot optimizes the reboot path to avoid this, saving considerable time from the upgrade process.

Understanding ESXi Quick Boot Compatibility
© 2021 VMware

Requirements:

  • The manufacturer’s platform must be compatible
  • All device drivers must be supported

Limitations in vSphere 7.0

  • TPM is disabled

Limitations in vSphere 6.7:

  • TPM is disabled
  • There are no VMs with passthrough devices configured.
  • No vmklinux drivers loaded on ESXi.

The VMware documentation includes a list of server manufacturers that support this technology for reference see the link in the VMware documentation “knowledge Base”.

For ESXi 7.0 or newer versions, you can check the “hardware” compatibility here:

In this lab I will test QuickBoot using Nested Virtualization. It is important to clarify that this is a test/dev scenario. To use this technology in production environments it is needed to activate the QuickBoot option from “VMware Update Manger” or the renamed “Lifecycle Manager”. If you are interested I leave a video here “Updates Installation with vSphere ESXi QuickBoot”.

[root@comp-01a:~] esxcli hardware platform get
Platform Information
   UUID: 0x27 0xa8 0x30 0x42 0xa3 0xf9 0x54 0xcf 0xe2 0xa3 0x10 0x1d 0xfd 0xb8 0x21 0xd0
   Product Name: VMware7,1
   Vendor Name: VMware, Inc.
   Serial Number: VMware-42 30 a8 27 f9 a3 cf 54-e2 a3 10 1d fd b8 21 d0
   Enclosure Serial Number: None
   BIOS Asset Tag: No Asset Tag
   IPMI Supported: false
[root@comp-01a:~]

The first step to activate QuickBoot is to validate the server compatibility. For this it is mandatory to connect via SSH to the ESXi server to execute the command <loadESXCheckCompat.py>. In this case the command confirmed that my platform is compatible.

[root@comp-01a:~] /usr/lib/vmware/loadesx/bin/loadESXCheckCompat.py
This system is compatible with Quick Boot.
[root@comp-01a:~]

Once the server is validated as compatible, we can activate the QuickBoot function with the command <loadESXEnable -e>.

[root@comp-01a:~] /bin/loadESXEnable -e
INFO: LoadESX Enabled
INFO: Precheck options:
INFO:   All prechecks are enabled.
[root@comp-01a:~]

The final step to activate the QuickBoot function would be to load the “QuickLaunch” configuration by using the command <loadESX.py>.

[root@comp-01a:~] /usr/lib/vmware/loadesx/bin/loadESX.py
DEBUG: LoadESX scripts are up to date.
INFO: Enabling QuickLaunch kernel preload
DEBUG: Using a ramdisk at /tmp/loadESX for intermediate storage
DEBUG: SYSTEM STORAGE IS ENABLED
INFO: Target version: 7.0.2-0.0.17867351
DEBUG: Install boot module "vmware_e.v00" with size 0x2ec40
DEBUG: Install boot module "vmware_f.v00" with size 0x1e3582f
DEBUG: Install boot module "vsan.v00" with size 0x25f08a8
DEBUG: Install boot module "vsanheal.v00" with size 0x80ebe0
DEBUG: Install boot module "vsanmgmt.v00" with size 0x18957cc
DEBUG: Install boot module "xorg.v00" with size 0x358c40
DEBUG: Install boot module "gc.v00" with size 0xe077
DEBUG: Install boot module "imgdb.tgz" with size 0x192800
DEBUG: Install boot module "state.tgz" with size 0x21a00
INFO: loadESX is ready ...
INFO: Performing QuickLaunch kernel preload...
[root@comp-01a:~] 

I am including two videos so you can see the speed of the QuickBoot technology when the server is rebooted.

VMware vSphere Normal Boot

VMware vSphere QuickBoot

VMware vSphere Native Key Provider

This is one of my favorite feature in vSphere 7 Update 2. VMware now provides the capability to use a new native key provider for encryption. Allowing us to use vSAN encryption, VM encryption and vTPM natively without the requirement to deploy a external Key provider. In the past this capability can only be provided by using a 3rd party solutions like Hytrust KeyControl. In this post i will explain how ease is to configure and deploy this awesome new feature.

Go to [Configure > Key Providers] to add the local key provider.

Select [ADD > Add Native Key Provider].

Provide a Name and press [ADD KEY PROVIDER].

Backup the Master keys.

Save the Native key Provider in a secure location. Optionally protect the key file with a strong password.

Verify the ESXi Server Host Encryption Mode is [Enable].

Test the configuration by encrypting an existing VM.

Change the default “VM Storage Policy” to [VM Encryption Policy].

Now the VM is encrypted with the Native Key Provider. Really Awesome Feature.

HomeLab – How to disable vSphere Cluster Services (vCLS)

In vSphere 7 update 1 VMware added a new capability for Distributed Resource Scheduler (DRS) technology consisting of three VMs called agents. The agent VMs form the quorum state of the cluster and have the ability to self-healing. So if you turn off or delete the VMs called vCLS the vCenter server will turn the VMs back on or re-create the VMs again. For HomeLab purposes this new feature consumes CPU resource, Memory and disk space which although minimal is not worth having a configuration that adds nothing to a test and development environment.

In this blog I will be showing how to remove this feature but it is important to emphasize not to implement this change in production environments. The following image show the minimum amount of vCLS VMs used in vCenter 7U1.

Note: vSphere DRS depends on the status of vCLS services as of vSphere 7.0 Update 1.

As you can see in my case the RegionA01-COMP cluster is composed of two ESXi servers where there are three vCLS VMs.

The first thing you need to do is to identify the cluster domain ID that is required to be able to add an advanced value in the vCenter configuration. This ID can be identified in two ways: From vCenter or using Powershell with PowerCLI.

vCenter: In this part you can get the cluster ID by navigating to the [Hosts and Clusters] tab then select the cluster you are going to edit where you can see that the ID is in the URL address of the browser.

PowerCLI: Using Powershell with the VMware.PowerCLI module you can obtain the cluster ID by invoking the <Get-Cluster> command. As shown in the result of the command you can see the value of the Id <domain-c81> for the cluster named RegionA01-COMP.

PS /home/rebelinux> Get-Cluster RegionA01-COMP | FL

ParentId                        : Folder-group-h23
ParentFolder                    : host
HAEnabled                       : True
HAAdmissionControlEnabled       : True
HAFailoverLevel                 : 1
HARestartPriority               : Medium
HAIsolationResponse             : DoNothing
VMSwapfilePolicy                : WithVM
DrsEnabled                      : True
DrsMode                         : FullyAutomated
DrsAutomationLevel              : FullyAutomated
CryptoMode                      : OnDemand
CollectiveHostManagementEnabled : False
Name                            : RegionA01-COMP
ExtensionData                   : VMware.Vim.ClusterComputeResource
Id                              : ClusterComputeResource-domain-c81

PS /home/rebelinux>

Once we have the cluster ID which in my case is <domain-c81> we proceed to add the configuration in vCenter by navigating to [Advanced Settings => Configure => Edit Settings].

In this screen add the <config.vcls.clusters.domain-cID.enabled> value which in my case would be <config.vcls.clusters.domain-c81.enabled> with the value of “false” in the “Value” field.

Once the value is added the VMs of type vCLS will be removed from the cluster as you can see in the image as the VMs no longer exist.

In the following image you can see the tasks performed by vCenter to remove the vCLS VMs.

Hasta luego!!!