Deploying VRA, Setting up MSSQL server errors out with “MS DTC on the IaaS database server is not configured correctly for network access”

IAAS.ikigo.net (SQL Server: db.ikigo.net)
(10032) The MS DTC on the IaaS database server is not configured correctly for network access. If the servers that host MSDTC have been cloned their CID might not be unique (located in HKEY_CLASSES_ROOT/CID) thus causing this issue. To regenerate its CID values the MSDTC must be reinstalled. Details: The partner transaction manager has disabled its support for remote/network transactions. (Exception from HRESULT: 0x8004D025) Details: The partner transaction manager has disabled its support for remote/network transactions. (Exception from HRESULT: 0x8004D025)
IAAS.ikigo.net\WEB
(10032) The MS DTC on the IaaS database server is not configured correctly for network access. If the servers that host MSDTC have been cloned their CID might not be unique (located in HKEY_CLASSES_ROOT/CID) thus causing this issue. To regenerate its CID values the MSDTC must be reinstalled. Details: The partner transaction manager has disabled its support for remote/network transactions. (Exception from HRESULT: 0x8004D025) Details: The partner transaction manager has disabled its support for remote/network transactions. (Exception from HRESULT: 0x8004D025)
IAAS.ikigo.net\Manager Service
(10032) The MS DTC on the IaaS database server is not configured correctly for network access. If the servers that host MSDTC have been cloned their CID might not be unique (located in HKEY_CLASSES_ROOT/CID) thus causing this issue. To regenerate its CID values the MSDTC must be reinstalled. Details: The partner transaction manager has disabled its support for remote/network transactions. (Exception from HRESULT: 0x8004D025) Details: The partner transaction manager has disabled its support for remote/network transactions. (Exception from HRESULT: 0x8004D025)

Running through the installation and installation of  MSDTC by following the instructions on KB https://kb.vmware.com/s/article/59422 did not help

in order to resolve this, I checked the component services on both IIAS and the DB server

GO to Start and then search for component services



Component services> Computers> My Computer> Distributed Translation Coordinator>

Right click on Local DTC> Select Properties>Click on security

In my instance, the allow inbound connection was disabled on the SQL instance. enable the allow inbound and hit apply, the wizard will automatically restart the DTC and then re-run the wizard.

Note: All the options as seen below must be enabled in order for the workflow to complete successfully

Re-run the validation and it was successfull:

Content library hack(DB)

VMware Content library, A unique way to make VM templates/ISO’s available across multiple vCenter’s. However, It does not handle Datacenter segregation very well when storing the contents on NFS/VMFS.

A subscribed library would need dedicated storage space and would be pointless in my setup as I had presented the NFS volume to all host.

So what seems to be the problem?

I have an NFS volume presented across all the host across several Data center. Although the NFS UUID and the name of the datastore are the same (ISO), on the vCenter database, this is stored with a different datacenter ID as they are segregated by the datacenter object.


  id  |      name      | datacenter_id
 -----+----------------+---------------
  157 | Template       |            77
  158 | SlowBro_400    |            77
  159 | ISO            |            77
  160 | SharedLUN      |            77
  161 | 10.154         |            77
  156 | template       |            77
   13 | SlowBro_legasy |             2
   14 | 10.128         |             2
   91 | ISO            |             2
   16 | Template       |             2
   92 | is-tse-d129_1  |             2
   12 | SlowBro_400    |             2

VCDB=# select * from vpx_entity where id=77;
  id |     name     | type_id | parent_id
 ----+--------------+---------+-----------
  77 | BLR          |       8 |         1
 (1 row)


VCDB=# select * from vpx_entity where id=2;
  id |    name    | type_id | parent_id
 ----+------------+---------+-----------
   2 | HYD        |       8 |         1
 (1 row)

With this configuration, When I had created a content library, as per the db, this was only referencing to once of the site datastore, IE: we only see iso (id 159) from BLR (id 77) datacenter.

What this means is The content library objects can only be deployed to one of the datacenter rather than being able to deploy them across all datacenter’s on the vCenter.

VCDB=# select * from cl_storage;
                  id                  |                          storageuri                          |   type
--------------------------------------+--------------------------------------------------------------+-----------
 a0018db8-f630-4c04-b0f1-30c900ad691c | Datastore:datastore-159:481639ff-d88d-4622-8872-ec6856e6b157  | Datastore
 bf8e8dcb-5b28-4b03-863e-89308bc8c501 | Datastore:datastore-157:481639ff-d88d-4622-8872-ec6856e6b157 | Datastore


The above table references:
VCDB=# select * from cl_library_storage;
              library_id              |              storage_id
--------------------------------------+--------------------------------------
 3336e2ad-8166-4e6a-850d-a9d81c41ba01 | a0018db8-f630-4c04-b0f1-30c900ad691c
 946cea65-5cd0-41e0-83ab-17259f690ce1 | bf8e8dcb-5b28-4b03-863e-89308bc8c501

So.. Logically, If I add the other ID for ISO on this table cl_storage and create a referencing record in cl_library_storage then we should be able to use the same content library across all datacenter.

The ID from the above must be unique and must match the two tables. I added the below records (i added another record by incrementing one of the values after the co-relating table.

after change
VCDB=# select * from cl_storage;
                  id                  |                          storageuri                          |   type
--------------------------------------+--------------------------------------------------------------+-----------
 a0018db8-f630-4c04-b0f1-30c900ad691c | Datastore:datastore-159:481639ff-d88d-4622-8872-ec6856e6b157  | Datastore
 bf8e8dcb-5b28-4b03-863e-89308bc8c501 | Datastore:datastore-157:481639ff-d88d-4622-8872-ec6856e6b157 | Datastore
 bf8e8dcb-5b28-4b03-863e-89308bc8c502 | Datastore:datastore-11:481639ff-d88d-4622-8872-ec6856e6b157  | Datastore

VCDB=# select * from cl_library_storage;
              library_id              |              storage_id
--------------------------------------+--------------------------------------
 3336e2ad-8166-4e6a-850d-a9d81c41ba01 | a0018db8-f630-4c04-b0f1-30c900ad691c
 946cea65-5cd0-41e0-83ab-17259f690ce1 | bf8e8dcb-5b28-4b03-863e-89308bc8c501
 946cea65-5cd0-41e0-83ab-17259f690ce1 | bf8e8dcb-5b28-4b03-863e-89308bc8c502

after adding the above records, I am now able to deploy VM’s from the content library across Datacenters.

content library DB schema can be found here:

/usr/lib/vmware-content-library/support/scripts/db/PostgreSQL/cls_unified/cls60.sql

ubuntu 18.04 getting VMware guest customization to work

ubuntu 18.x is by default shipped with cloud-init/netplan that breaks when customizing the VM using vCenter custom spec. In this blog, I’ll show you how to get the customization to work with vCenter.

On a fresh install of ubuntu 18.04, create a bash script with the below contents (mine was setup using DHCP)

cleanup.sh

sudo cloud-init clean --logs
sudo touch /etc/cloud/cloud-init.disabled
sudo rm -rf /etc/netplan/50-cloud-init.yaml
sudo apt purge cloud-init -y
sudo apt autoremove -y


# Don't clear /tmp
sudo sed -i 's/D \/tmp 1777 root root -/#D \/tmp 1777 root root -/g' /usr/lib/tmpfiles.d/tmp.conf

# Remove cloud-init and rely on dbus for open-vm-tools
sudo sed -i 's/Before=cloud-init-local.service/After=dbus.service/g' /lib/systemd/system/open-vm-tools.service



# cleanup current ssh keys so templated VMs get fresh key
# sudo rm -f /etc/ssh/ssh_host_*

# add check for ssh keys on reboot...regenerate if neccessary
sudo tee /etc/rc.local >/dev/null <<EOL
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#

# By default this script does nothing.
# test -f /etc/ssh/ssh_host_dsa_key || dpkg-reconfigure openssh-server
# exit 0
EOL

# make the script executable
sudo chmod +x /etc/rc.local

# cleanup apt
sudo apt clean

# reset the machine-id (DHCP leases in 18.04 are generated based on this... not MAC...)
echo "" | sudo tee /etc/machine-id >/dev/null

# disable swap for K8s
sudo swapoff --all
sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab

# cleanup shell history and shutdown for templating
history -c
history -w
sudo shutdown -h now

Note, sometimes copy-paste can change the special characters, should that be the case, please use this link to download the file:

once the script is run, the VM should power off automatically. convert the VM to the template and then test by deploying this with a guest customization spec

Note: Do not run the command directly from putty/shell. in some cases i’ve noticed the networking on the VM goes blank causing the VM to go off-network when the netplan is being removed..

always invoke the above via the bash script local to the guest os.

any host/VM tasks performed on vCenter errors with ““A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections””

“A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections”

logs

Vpxd logs

19-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] [UpdateValuesInt] Updating stored value for property at index 2
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.cancelable.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.error.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.state.
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=vpxLro opID=27cffcd2] [VpxLRO] -- FINISH task-101413
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=Default opID=27cffcd2] [VpxLRO] -- ERROR task-101413 -- vm-1889 -- vim.VirtualMachine.powerOn: vmodl.fault.SystemError:
--> Result:
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections."
-->    msg = ""
--> }
--> Args:
-->
--> Arg host:
-->
---------
---------
---------
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature Reference URI: `#_cae765cb-f129-42d3-9387-423e307ed6f2' ; is-valid: true
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Missing reference count: 0
2019-08-28T16:19:51.239-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature SignedInfo: is-valid: true
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] Successfully acquired token: SamlToken [subject={Name: vpxd-5b47a55c-75af-455c-979f-83eb915e7a61; Domain:vsphere.local}, groups=[{Name: Use
rs; Domain:vsphere.local}, {Name: SolutionUsers; Domain:vsphere.local}, {Name: SystemConfiguration.Administrators; Domain:vsphere.local}, {Name: ComponentManager.Administrators; Domain:vsphere.local}, {Name: LicenseService.Administrators
; Domain:vsphere.local}, {Name: Everyone; Domain:vsphere.local}], delegationChain=[], startTime=2019-08-28 23:19:51.204, expirationTime=2019-08-29 07:19:51.204, renewable=false, delegable=false, isSolution=true,confirmationType=1]
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000001 opID=27cffcd2-01] [PopPendingConnection] No pending connections to <cs p:00007ff888079eb0, SsoCustomConnectionSpec:vcenter-hp.vsphere.local:4
43>
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] END operation SecurityTokenServiceImpl::AcquireTokenByCertificate
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=[SSO][SsoWrapperImpl] opID=27cffcd2-01] [AcquireToken] Token acquired successfully.
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000211 opID=27cffcd2-01] [IncConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> incremented to 1
2019-08-28T16:19:51.239-07:00 warning vpxd[05398] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007ff828296790, h:86, <TCP '127.0.0.1 : 55994'>, <TCP '127.0.0.1 : 8190'>>, e: 111(Connection refused)
2019-08-28T16:19:51.239-07:00 trivia vpxd[05398] [Originator@6876 sub=Default] Setting error in state 1 : N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting con
nections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 trivia vpxd[05398] [Originator@6876 sub=HttpConnectionPool-000211] [DecConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> dec to 0
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=pbm opID=27cffcd2-01] [ConnectLocked] Failed to login to service: N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, O
R a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Get exception while executing action vpx.vmprov.CheckCompatibility: N7Vmacore9ExceptionE(Connection refused: The remote service is not running,
OR is overloaded, OR a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.241-07:00 info vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Workflow context:
--> (vpx.vmprov.MigrateContext) {
-->    cbData = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "workflow.startTime",
-->          value = 5013023961
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "pbmPreCheckSkipped",
-->          value = true

from the above snippet, it appears the connection to vcenter port: 8190 was being rejected. as per Vmware docs, port 8190 is used by profile driven storage so we take a look at profile-driven storage log:


Sps.log

2019-08-28T16:25:31.402-07:00 [main] INFO  opId=sps-Main-34727-852 com.vmware.vim.storage.common.util.PropertiesWrapper - Ignoring missing property file sps-ext.properties
2019-08-28T16:25:31.402-07:00 [main] ERROR opId=sps-Main-34727-852 com.vmware.sps.util.SpsConfiguration - Error reading the configuration file: java.lang.NumberFormatException: null

at this stage, the service refused to start pointing to an invalid entry in the configuration file. I took a look at sps.properties and it appeared to have 2 lines compared to that of a working setup.

To resolve the service startup issue, I copied the sps.properties from a working box (no changes done). I have listed the contents of this file below:

sps.properties

[‎29-‎08-‎2019 05:04 AM]  
No Title 
# IMPORTANT: To edit an entry in this file, create sps-ext.properties and specify the required key/value details.
#
# sps server port configuration
#
sps.http.port = 21000
sps.https.port = 21100
# sps server instance GUID
sps.serverGuid = ##SPS_SERVER_GUID##
# Service extension key registered with VC
sps.extensionKey = com.vmware.vim.sps
# Re-connect config to VC
# If true, SPS will retry connection to VC until success
sps.vcConnection.infiniteAttempt = false
# If infiniteAttempt is false, SPS will try to connect to VC until the number specified by attemptNumber
sps.vcConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.vcConnection.sleepInterval = 60
# Re-connect config to QS
# If true, SPS will retry connection to QS until success
sps.qsConnection.infiniteAttempt = true
# If infiniteAttempt is false, SPS will try to connect to QS until the number specified by attemptNumber
sps.qsConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.qsConnection.sleepInterval = 60
sps.queryFile = sps-xqueries.xml
sps.overWriteQsData = false
# Time in seconds to wait for the internal compliance tasks.
sps.compliance.complianceTaskWaitTime = 300
# Time in milliseconds to check for task completion for each policy blob.
sps.compliance.complianceTaskCheckInterval = 100
# VC Server GUID
vpxd.vcGuid = C89B6A4D-489E-435E-97C6-847E892F254F
# number of retries when connecting to kv service (Set -1 for infinite attempts)
sps.connectionRetryAttempts = -1
# retry intervals when connecting to kv service in seconds
sps.connectionRetryInterval = 10
# Time in seconds to wait before retrying sync policy.
sps.syncPolicy.retryWaitTime = 60
# Thread pool queue size for all sps tasks
spbm.threadpool.queueSize = 100
# Thread pool keepAlive timeout in seconds for all sps tasks
spbm.threadpool.keepAlive = 10
# Thread pool config for profile
spbm.profile.threadpool.corePoolSize = 5
spbm.profile.threadpool.maxPoolSize = 32
# Thread pool config for policy blob
spbm.policyBlob.threadpool.corePoolSize = 10
spbm.policyBlob.threadpool.maxPoolSize = 32
# Thread pool config for vendor provider
spbm.vendorProvider.threadpool.corePoolSize = 10
spbm.vendorProvider.threadpool.maxPoolSize = 32
# Thread pool config for vcquery related tasks
spbm.vcquery.threadpool.corePoolSize = 10
spbm.vcquery.threadpool.maxPoolSize = 32
# Thread pool config for VLSI thread pool
# There are two modes, auto which is computed and assigned during runtime
# and manual which can be assigned manually by setting in sps-ext.properties
spbm.vlsi.threadpool.config = auto
spbm.vlsi.threadpool.corePoolSize.manual = 10
spbm.vlsi.threadpool.corePoolSize.auto = 10
spbm.vlsi.threadpool.maxPoolSize = 50 
spbm.vlsi.threadpool.queueSize = 50
# Thread pool config for generic SPS
spbm.generic.threadpool.corePoolSize = 5
spbm.generic.threadpool.maxPoolSize = 32 

Enable TFTP on VCSA

Start TFTP service

service atftpd start

Allow TFTP port on the VCSA firewall

iptables -A port_filter -m state --state New -i eth0 -p udp --dport 69 -j ACCEPT

Confirm if the port is allowed on the firewall

iptables -nL | grep 69


Make the firewall rules persistent:

Export Ip tables rule

iptables-save > /etc/iptables.rules

Create a startup script at path: /etc/init.d/startftp.sh with the below contents:

#! /bin/sh
#
# TFTP Start/Stop the TFTP service and allow port 69
#
# chkconfig: 345 80 05
# description: atftpd

### BEGIN INIT INFO
# Provides: atftpd
# Required-Start: $local_fs $remote_fs $network
# Required-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Description: TFTP
### END INIT INFO

service atftpd start
iptables-restore -c < /etc/iptables.rules

change the permissions of the script

chmod +x /etc/init.d/startftp.sh

set the script to run during startup:

chkconfig --add /etc/init.d/startftp.sh

copy the contents of TFTP from autodeploy_zip to /var/lib/tftpboot

Esxi, I node full

Use the below commands to check and delete the stale indoe

for f in $(find /var/run/vmware -type l); do if [ ! -e "$f" ]; then echo "$f"; fi; done > /tmp/suspect

 find /var/run/vmware -type l | while read f; do if [ ! -e "$f" ]; then rm -f "$f"; fi; done

PowerCLi: remove orphaned VM’s from vCenter inventory

For instructions on how to connect to VMware PowerCLI, Follow the post here:

Run the below to get the list of orphaned VM

$allVMs=Get-VM
foreach ($vm in $allVMs) {
 if ($vm.ExtensionData.Runtime.ConnectionState -eq "orphaned") {$vm.name}
}

Run the below to remove orphaned VM

foreach ($vm in $allVMs) {
 if ($vm.ExtensionData.Runtime.ConnectionState -eq "orphaned") {$vm | Remove-VM}
}

PowerCli- Script to reconfigure the default alarm email address on the vCenter server.

Using the GUI to set up email alerts for the default alarm might be time-consuming. In this post I will show you how to use VMware PowerCLi to automate re-configuring the existing, Default alarms with the notification email address.

You will need VMware PowerCLi to run through this. if you dont have this installed already, follow the instructions found here.

Use The below script to change the enable Default email action to the email address specified in the $newEmail=” field

$newEmail = 'ntitta@ikigo.net'
foreach ($alarm in Get-AlarmDefinition){
    $action = Get-AlarmAction -AlarmDefinition $alarm
    $mail = $action | where {$_.ActionType -eq 'SendEmail'}
	New-AlarmAction -AlarmDefinition $alarm -Email -To $newEmail -Subject $mail.Subject -Confirm:$false
}

Add a user to VCSA

add user

adduser username
usermod -aG sudo username

allow user to ssh to the appliance

edit /etc/ssh/sshd_conf and add the user account here

change default shell to bash for ssh to work.

chsh -s /usr/local/bin/bash username