Tip: Duplicate Node ID
Contents
This small article describes how to change the ct_node_id in case two LPARs have the same.
1. Duplicate Node ID after LPAR Cloning
When you clone an LPAR by splitting up a copy of the rootvg of an existing LPAR on storage level, both the original and the cloned rootvg have the same /etc/ct_node_id. This confuses the HMC and communication between LPAR and HMC will only partly work. DLPAR for instance doesn't work if two LPARs have the same node ID in /etc/ct_node_id.
The node ID can be seen with the command
# /usr/sbin/rsct/bin/lsnodeid 4c1db4ed80de82ef
It is stored in the file /etc/ct_node_id:
# cat /etc/ct_node_id 4c1db4ed80de82ef # The first line of this file contains the RSCT node id of this # machine. Please do not delete or modify it.
2. How to Change the Node ID
The phenomena of a duplicate node ID is not new with CAA (Cluster Aware AIX) and has already been described 9 years ago at unixwerk (see LPARclone.html). But the solution provided there does not work with recent AIX releases anymore. So here comes the new procedure.
Any PowerHA 7 cluster should be stopped before running the below commands.
First we stop rsct_rm
CLONE# stopsrc -g rsct_rm 0513-044 The IBM.HostRM Subsystem was requested to stop. 0513-044 The IBM.DRM Subsystem was requested to stop. 0513-044 The IBM.MgmtDomainRM Subsystem was requested to stop. 0513-044 The IBM.ServiceRM Subsystem was requested to stop. 0513-044 The IBM.ERRM Subsystem was requested to stop. 0513-044 The IBM.AuditRM Subsystem was requested to stop.
and rsct¹ subsystems
CLONE# /usr/sbin/rsct/bin/rmcctrl -k 0513-044 The ctrmc Subsystem was requested to stop. CLONE# /usr/sbin/rsct/bin/rmcctrl -z CLONE# /usr/sbin/rsct/bin/rmcctrl -A CLONE# /usr/sbin/rsct/bin/rmcctrl -p CLONE# stopsrc -g rsct 0513-044 The ctrmc Subsystem was requested to stop.
Now we are ready to reset the Node ID. Starting with CAA the Node ID is stored in the ODM:
CLONE# lsattr -El cluster0 -a node_uuid node_uuid e4c0aac8-31dd-11e5-a8dd-1e25b103930a OS image identifier True
This is the reason why the old method does not work anymore. As long as the node ID is set in the ODM the recfgct command will always re-install the same node ID to /etc/ct_node_id. So we need to reset the node ID in the ODM:
CLONE# chdev -l cluster0 -a node_uuid=00000000-0000-0000-0000-000000000000 cluster0 changed
Now we can create a new node ID and restart the rsct_rm and rsct daemons:
CLONE# /usr/sbin/rsct/bin/mknodeid -f CLONE# /usr/sbin/rsct/install/bin/recfgct 0513-071 The ctcas Subsystem has been added. 0513-071 The ctrmc Subsystem has been added. 0513-059 The ctrmc Subsystem has been started. Subsystem PID is 589920.
We check:
CLONE# lsattr -El cluster0 -a node_uuid node_uuid b6e59ba2-008d-11e6-8430-e687f6863811 OS image identifier Trueand
CLONE# /usr/sbin/rsct/bin/lsnodeid 32d57d25f60b29f7and
CLONE# cat /etc/ct_node_id 32d57d25f60b29f7 # The first line of this file contains the RSCT node id of this # machine. Please do not delete or modify it.
and see that the ID has indeed changed.