ubuntu 18.04 getting VMware guest customization to work

ubuntu 18.x is by default shipped with cloud-init/netplan that breaks when customizing the VM using vCenter custom spec. In this blog, I’ll show you how to get the customization to work with vCenter.

On a fresh install of ubuntu 18.04, create a bash script with the below contents (mine was setup using DHCP)

cleanup.sh

sudo cloud-init clean --logs
sudo touch /etc/cloud/cloud-init.disabled
sudo rm -rf /etc/netplan/50-cloud-init.yaml
sudo apt purge cloud-init -y
sudo apt autoremove -y


# Don't clear /tmp
sudo sed -i 's/D \/tmp 1777 root root -/#D \/tmp 1777 root root -/g' /usr/lib/tmpfiles.d/tmp.conf

# Remove cloud-init and rely on dbus for open-vm-tools
sudo sed -i 's/Before=cloud-init-local.service/After=dbus.service/g' /lib/systemd/system/open-vm-tools.service



# cleanup current ssh keys so templated VMs get fresh key
# sudo rm -f /etc/ssh/ssh_host_*

# add check for ssh keys on reboot...regenerate if neccessary
sudo tee /etc/rc.local >/dev/null <<EOL
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#

# By default this script does nothing.
# test -f /etc/ssh/ssh_host_dsa_key || dpkg-reconfigure openssh-server
# exit 0
EOL

# make the script executable
sudo chmod +x /etc/rc.local

# cleanup apt
sudo apt clean

# reset the machine-id (DHCP leases in 18.04 are generated based on this... not MAC...)
echo "" | sudo tee /etc/machine-id >/dev/null

# disable swap for K8s
sudo swapoff --all
sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab

# cleanup shell history and shutdown for templating
history -c
history -w
sudo shutdown -h now

Note, sometimes copy-paste can change the special characters, should that be the case, please use this link to download the file:

once the script is run, the VM should power off automatically. convert the VM to the template and then test by deploying this with a guest customization spec

Note: Do not run the command directly from putty/shell. in some cases i’ve noticed the networking on the VM goes blank causing the VM to go off-network when the netplan is being removed..

always invoke the above via the bash script local to the guest os.

any host/VM tasks performed on vCenter errors with ““A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections””

“A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections”

logs

Vpxd logs

19-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] [UpdateValuesInt] Updating stored value for property at index 2
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.cancelable.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.error.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.state.
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=vpxLro opID=27cffcd2] [VpxLRO] -- FINISH task-101413
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=Default opID=27cffcd2] [VpxLRO] -- ERROR task-101413 -- vm-1889 -- vim.VirtualMachine.powerOn: vmodl.fault.SystemError:
--> Result:
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections."
-->    msg = ""
--> }
--> Args:
-->
--> Arg host:
-->
---------
---------
---------
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature Reference URI: `#_cae765cb-f129-42d3-9387-423e307ed6f2' ; is-valid: true
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Missing reference count: 0
2019-08-28T16:19:51.239-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature SignedInfo: is-valid: true
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] Successfully acquired token: SamlToken [subject={Name: vpxd-5b47a55c-75af-455c-979f-83eb915e7a61; Domain:vsphere.local}, groups=[{Name: Use
rs; Domain:vsphere.local}, {Name: SolutionUsers; Domain:vsphere.local}, {Name: SystemConfiguration.Administrators; Domain:vsphere.local}, {Name: ComponentManager.Administrators; Domain:vsphere.local}, {Name: LicenseService.Administrators
; Domain:vsphere.local}, {Name: Everyone; Domain:vsphere.local}], delegationChain=[], startTime=2019-08-28 23:19:51.204, expirationTime=2019-08-29 07:19:51.204, renewable=false, delegable=false, isSolution=true,confirmationType=1]
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000001 opID=27cffcd2-01] [PopPendingConnection] No pending connections to <cs p:00007ff888079eb0, SsoCustomConnectionSpec:vcenter-hp.vsphere.local:4
43>
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] END operation SecurityTokenServiceImpl::AcquireTokenByCertificate
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=[SSO][SsoWrapperImpl] opID=27cffcd2-01] [AcquireToken] Token acquired successfully.
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000211 opID=27cffcd2-01] [IncConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> incremented to 1
2019-08-28T16:19:51.239-07:00 warning vpxd[05398] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007ff828296790, h:86, <TCP '127.0.0.1 : 55994'>, <TCP '127.0.0.1 : 8190'>>, e: 111(Connection refused)
2019-08-28T16:19:51.239-07:00 trivia vpxd[05398] [Originator@6876 sub=Default] Setting error in state 1 : N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting con
nections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 trivia vpxd[05398] [Originator@6876 sub=HttpConnectionPool-000211] [DecConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> dec to 0
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=pbm opID=27cffcd2-01] [ConnectLocked] Failed to login to service: N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, O
R a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Get exception while executing action vpx.vmprov.CheckCompatibility: N7Vmacore9ExceptionE(Connection refused: The remote service is not running,
OR is overloaded, OR a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.241-07:00 info vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Workflow context:
--> (vpx.vmprov.MigrateContext) {
-->    cbData = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "workflow.startTime",
-->          value = 5013023961
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "pbmPreCheckSkipped",
-->          value = true

from the above snippet, it appears the connection to vcenter port: 8190 was being rejected. as per Vmware docs, port 8190 is used by profile driven storage so we take a look at profile-driven storage log:


Sps.log

2019-08-28T16:25:31.402-07:00 [main] INFO  opId=sps-Main-34727-852 com.vmware.vim.storage.common.util.PropertiesWrapper - Ignoring missing property file sps-ext.properties
2019-08-28T16:25:31.402-07:00 [main] ERROR opId=sps-Main-34727-852 com.vmware.sps.util.SpsConfiguration - Error reading the configuration file: java.lang.NumberFormatException: null

at this stage, the service refused to start pointing to an invalid entry in the configuration file. I took a look at sps.properties and it appeared to have 2 lines compared to that of a working setup.

To resolve the service startup issue, I copied the sps.properties from a working box (no changes done). I have listed the contents of this file below:

sps.properties

[‎29-‎08-‎2019 05:04 AM]  
No Title 
# IMPORTANT: To edit an entry in this file, create sps-ext.properties and specify the required key/value details.
#
# sps server port configuration
#
sps.http.port = 21000
sps.https.port = 21100
# sps server instance GUID
sps.serverGuid = ##SPS_SERVER_GUID##
# Service extension key registered with VC
sps.extensionKey = com.vmware.vim.sps
# Re-connect config to VC
# If true, SPS will retry connection to VC until success
sps.vcConnection.infiniteAttempt = false
# If infiniteAttempt is false, SPS will try to connect to VC until the number specified by attemptNumber
sps.vcConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.vcConnection.sleepInterval = 60
# Re-connect config to QS
# If true, SPS will retry connection to QS until success
sps.qsConnection.infiniteAttempt = true
# If infiniteAttempt is false, SPS will try to connect to QS until the number specified by attemptNumber
sps.qsConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.qsConnection.sleepInterval = 60
sps.queryFile = sps-xqueries.xml
sps.overWriteQsData = false
# Time in seconds to wait for the internal compliance tasks.
sps.compliance.complianceTaskWaitTime = 300
# Time in milliseconds to check for task completion for each policy blob.
sps.compliance.complianceTaskCheckInterval = 100
# VC Server GUID
vpxd.vcGuid = C89B6A4D-489E-435E-97C6-847E892F254F
# number of retries when connecting to kv service (Set -1 for infinite attempts)
sps.connectionRetryAttempts = -1
# retry intervals when connecting to kv service in seconds
sps.connectionRetryInterval = 10
# Time in seconds to wait before retrying sync policy.
sps.syncPolicy.retryWaitTime = 60
# Thread pool queue size for all sps tasks
spbm.threadpool.queueSize = 100
# Thread pool keepAlive timeout in seconds for all sps tasks
spbm.threadpool.keepAlive = 10
# Thread pool config for profile
spbm.profile.threadpool.corePoolSize = 5
spbm.profile.threadpool.maxPoolSize = 32
# Thread pool config for policy blob
spbm.policyBlob.threadpool.corePoolSize = 10
spbm.policyBlob.threadpool.maxPoolSize = 32
# Thread pool config for vendor provider
spbm.vendorProvider.threadpool.corePoolSize = 10
spbm.vendorProvider.threadpool.maxPoolSize = 32
# Thread pool config for vcquery related tasks
spbm.vcquery.threadpool.corePoolSize = 10
spbm.vcquery.threadpool.maxPoolSize = 32
# Thread pool config for VLSI thread pool
# There are two modes, auto which is computed and assigned during runtime
# and manual which can be assigned manually by setting in sps-ext.properties
spbm.vlsi.threadpool.config = auto
spbm.vlsi.threadpool.corePoolSize.manual = 10
spbm.vlsi.threadpool.corePoolSize.auto = 10
spbm.vlsi.threadpool.maxPoolSize = 50 
spbm.vlsi.threadpool.queueSize = 50
# Thread pool config for generic SPS
spbm.generic.threadpool.corePoolSize = 5
spbm.generic.threadpool.maxPoolSize = 32 

Enable TFTP on VCSA

Start TFTP service

service atftpd start

Allow TFTP port on the VCSA firewall

iptables -A port_filter -m state --state New -i eth0 -p udp --dport 69 -j ACCEPT

Confirm if the port is allowed on the firewall

iptables -nL | grep 69


Make the firewall rules persistent:

Export Ip tables rule

iptables-save > /etc/iptables.rules

Create a startup script at path: /etc/init.d/startftp.sh with the below contents:

#! /bin/sh
#
# TFTP Start/Stop the TFTP service and allow port 69
#
# chkconfig: 345 80 05
# description: atftpd

### BEGIN INIT INFO
# Provides: atftpd
# Required-Start: $local_fs $remote_fs $network
# Required-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Description: TFTP
### END INIT INFO

service atftpd start
iptables-restore -c < /etc/iptables.rules

change the permissions of the script

chmod +x /etc/init.d/startftp.sh

set the script to run during startup:

chkconfig --add /etc/init.d/startftp.sh

copy the contents of TFTP from autodeploy_zip to /var/lib/tftpboot

Esxi, I node full

Use the below commands to check and delete the stale indoe

for f in $(find /var/run/vmware -type l); do if [ ! -e "$f" ]; then echo "$f"; fi; done > /tmp/suspect

 find /var/run/vmware -type l | while read f; do if [ ! -e "$f" ]; then rm -f "$f"; fi; done

PowerCLi: remove orphaned VM’s from vCenter inventory

For instructions on how to connect to VMware PowerCLI, Follow the post here:

Run the below to get the list of orphaned VM

$allVMs=Get-VM
foreach ($vm in $allVMs) {
 if ($vm.ExtensionData.Runtime.ConnectionState -eq "orphaned") {$vm.name}
}

Run the below to remove orphaned VM

foreach ($vm in $allVMs) {
 if ($vm.ExtensionData.Runtime.ConnectionState -eq "orphaned") {$vm | Remove-VM}
}

PowerCli- Script to reconfigure the default alarm email address on the vCenter server.

Using the GUI to set up email alerts for the default alarm might be time-consuming. In this post I will show you how to use VMware PowerCLi to automate re-configuring the existing, Default alarms with the notification email address.

You will need VMware PowerCLi to run through this. if you dont have this installed already, follow the instructions found here.

Use The below script to change the enable Default email action to the email address specified in the $newEmail=” field

$newEmail = 'ntitta@ikigo.net'
foreach ($alarm in Get-AlarmDefinition){
    $action = Get-AlarmAction -AlarmDefinition $alarm
    $mail = $action | where {$_.ActionType -eq 'SendEmail'}
	New-AlarmAction -AlarmDefinition $alarm -Email -To $newEmail -Subject $mail.Subject -Confirm:$false
}

Add a user to VCSA

add user

adduser username
usermod -aG sudo username

allow user to ssh to the appliance

edit /etc/ssh/sshd_conf and add the user account here

change default shell to bash for ssh to work.

chsh -s /usr/local/bin/bash username  

Installing RealTek Nic on Esxi (Esxi white box)

Desktop hardware’s are normally include Realtek nic which do now work in a base install of Esxi. In this article, The below post will walk you through steps to get Realtek nic working

Determine the nic hardware by running the below command:

root@Ryzen:~] lspci -v | grep "Class 0200" -B 1
0000:03:00.0 Network controller Ethernet controller: Realtek Semiconductor Co., Ltd. Onboard Ethernet
         Class 0200: 10ec:8168
--
0000:07:00.0 Network controller Ethernet controller: QLogic Corporation QLogic NetXtreme II BCM5709 1000Base-T [vmnic2]
         Class 0200: 14e4:1639
--
0000:07:00.1 Network controller Ethernet controller: QLogic Corporation QLogic NetXtreme II BCM5709 1000Base-T [vmnic3]
         Class 0200: 14e4:1639
--
0000:08:00.0 Network controller Ethernet controller: QLogic Corporation QLogic NetXtreme II BCM5709 1000Base-T [vmnic0]
         Class 0200: 14e4:1639
--
0000:08:00.1 Network controller Ethernet controller: QLogic Corporation QLogic NetXtreme II BCM5709 1000Base-T [vmnic1]
         Class 0200: 14e4:1639

Run the below command to switch the acceptance level to community support (the VIB can only be installed in community support)

[root@Ryzen:~] esxcli software acceptance set --level=CommunitySupported
Host acceptance level changed to 'CommunitySupported'.

Allow http traffic from the shell by making changes to the firewall

[root@Ryzen:~] esxcli network firewall ruleset set -e true -r httpClient


Use the below command to download and install the VIB

[root@Ryzen:~] esxcli software vib install -d https://vibsdepot.v-front.de -n net55-r8168
Installation Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Reboot Required: true
   VIBs Installed: Realtek_bootbank_net55-r8168_8.045a-napi
   VIBs Removed:
   VIBs Skipped:

Reboot the host and you should now have a working Realtek NIC!

cron jobs on vcsa 6.7

root@is-dhcp40-236 [ /etc/cron.d ]# cat nuke_logs.cron
* /1 * * * *   root . /usr/sbin/nukedns.sh >/dev/null 2>&1

root@is-dhcp40-236 [ /etc/cron.d ]# cat /usr/sbin/nukedns.sh
echo  0 > /var/log/vmware/dnsmasq.log
echo  0 > /var/log/vmware/other_logs_that_that_needs_to_be_nulled


change /1 to x for the min, duration 

permission for cron file must be 666 or 700

an example can be found in the attachment for
https://kb.vmware.com/s/article/54526 (use WinRAR to extract the attachment, the file shows up as corrupt otherwise)

vCenter Webclient logon screen glitches after upgrade

After vCenter upgrade, the Logon screen is improperly formatted and might look like the below:

the text would read like the below:


<img id=\'topSplash\' src=\'..\/..\/resources\/img\/AppBgPattern.png\'><img id=\'brand\' src=\'..\/..\/resources\/img\/vmwareLogoBigger.png\'><span>VMware<sup>®<\/sup> vCloud Automation Center<sup>™<\/sup><\/span><style type=\'text\/css\'>body { background: #3075ab; \/* Old browsers *\/ background: -moz-linear-gradient(top, #3a8dc8 0%, #183a62 100%); \/* FF3.6+ *\/ background: -webkit-gradient(linear, left top, left bottom, color-stop(0%, #3a8dc8), color-stop(100%, #183a62)); \/* Chrome,Safari4+ *\/ background: -webkit-linear-gradient(top, #3a8dc8 0%, #183a62 100%); \/* Chrome10+,Safari5.1+ *\/ background: -o-linear-gradient(top, #3a8dc8 0%, #183a62 100%); \/* Opera 11.10+ *\/ background: -ms-linear-gradient(top, #3a8dc8 0%, #183a62 100%); \/* IE10+ *\/ background: linear-gradient(to bottom, #3a8dc8 0%, #183a62 100%); \/* W3C *\/ filter: progid:DXImageTransform.Microsoft.gradient( startColorstr=\'#3a8dc8\', endColorstr=\'#183a62\', GradientType=0); \/* IE6-9 *\/ background-repeat: no-repeat; margin : 0; font-size : 12px; font-family : Arial, Helvetica, sans-serif; color: #87ceff; margin: 0; font-size: 12px; font-family: Arial, Helvetica, sans-serif;}#topSplash { position: absolute; top: 0; left: 0; z-index: 1;}#brand { position: absolute; top: 55px; left: 44px; z-index: 2;}#tenantBrand { top: 0; left: 0; margin: 0; padding: 0; width: 100%;}#tenantBrand span { position: absolute; top: 345px; left: 424px; color: #FFF; font-size: 21px;}#tenantBrand sup { font-size: 11px;}#loginForm { background-image: url(..\/..\/resources\/img\/divider.png);}.loginLabel { color: #FFFFFF;}#productName { top: 365px;}#response { color: #87CEFF;}#footer { background-color: 090B0D; color: #838689;}<\/style> 

or

 var tenant_brandname="<img id=\'topSplash\' src=\'..\/..\/resources\/img\/AppBgPattern.png\'><img id=\'brand\' src=\'..\/..\/resources\/img\/vmwareLogoBigger.png\'><span>VMware<sup>®<\/sup> vRealize<sup>™<\/sup> Automation<\/span><style type=\'text\/css\'>body {    background: #3075ab; \/* Old browsers *\/    background: -moz-linear-gradient(top, #3a8dc8 0%, #183a62 100%);    \/* FF3.6+ *\/    background: -webkit-gradient(linear, left top, left bottom, color-stop(0%, #3a8dc8),        color-stop(100%, #183a62)); \/* Chrome,Safari4+ *\/    background: -webkit-linear-gradient(top, #3a8dc8 0%, #183a62 100%);    \/* Chrome10+,Safari5.1+ *\/    background: -o-linear-gradient(top, #3a8dc8 0%, #183a62 100%);    \/* Opera 11.10+ *\/    background: -ms-linear-gradient(top, #3a8dc8 0%, #183a62 100%);    \/* IE10+ *\/    background: linear-gradient(to bottom, #3a8dc8 0%, #183a62 100%);    \/* W3C *\/    filter: progid:DXImageTransform.Microsoft.gradient( startColorstr=\'#3a8dc8\',        endColorstr=\'#183a62\', GradientType=0); \/* IE6-9 *\/    background-repeat: no-repeat; margin : 0; font-size : 12px; font-family    : Arial, Helvetica, sans-serif;    color: #87ceff;    margin: 0;    font-size: 12px;    font-family: Arial, Helvetica, sans-serif;}#topSplash {    position: absolute;    top: 0;    left: 0;    z-index: 1;}#brand {    position: absolute;    top: 55px;    left: 44px;    z-index: 2;}#tenantBrand {    top: 0;    left: 0;    margin: 0;    padding: 0;    width: 100%;}#tenantBrand span {    position: absolute;    top: 345px;    left: 499px;    color: #FFF;    font-size: 21px;}#tenantBrand sup {    font-size: 11px;}#loginForm {    background-image: url(..\/..\/resources\/img\/divider.png);}.loginLabel {    color: #FFFFFF;}#productName {    top: 365px;}#response {    color: #87CEFF;}#footer {    background-color: 090B0D;    color: #838689;}<\/style>";

This is because the STS banner flag has an inappropriate data. Inorder to fix this, download and connect to the sso using jxplorer: https://kb.vmware.com/s/article/2077170

Note: Take a snapshot of the PSC/backup the vmdird database (/storage/db/vmware-vmdir/*mdb) before proceeding, deleting the wrong object can break the psc/vCenter.

Delete/remove the value on the attribute ‘vmwSTSBrandName’ under the object dn ‘cn=vsphere.local,cn=Tenants,cn=IdentityManager,cn=Services,dc=vsphere,dc=local’ using jxplorer
(screenshot below)