April 21, 2016

Tip: Duplicate Node ID

Duplicate Node ID after LPAR Cloning
How to Change the Node ID

This small article describes how to change the ct_node_id in case two LPARs have the same.

1. Duplicate Node ID after LPAR Cloning

When you clone an LPAR by splitting up a copy of the rootvg of an existing LPAR on storage level, both the original and the cloned rootvg have the same /etc/ct_node_id. This confuses the HMC and communication between LPAR and HMC will only partly work. DLPAR for instance doesn't work if two LPARs have the same node ID in /etc/ct_node_id.

The node ID can be seen with the command

# /usr/sbin/rsct/bin/lsnodeid
4c1db4ed80de82ef

It is stored in the file /etc/ct_node_id:

# cat /etc/ct_node_id
4c1db4ed80de82ef

# The first line of this file contains the RSCT node id of this
# machine.  Please do not delete or modify it.

2. How to Change the Node ID

The phenomena of a duplicate node ID is not new with CAA (Cluster Aware AIX) and has already been described 9 years ago at unixwerk (see LPARclone.html). But the solution provided there does not work with recent AIX releases anymore. So here comes the new procedure.

Any PowerHA 7 cluster should be stopped before running the below commands.

First we stop rsct_rm

CLONE# stopsrc -g rsct_rm
0513-044 The IBM.HostRM Subsystem was requested to stop.
0513-044 The IBM.DRM Subsystem was requested to stop.
0513-044 The IBM.MgmtDomainRM Subsystem was requested to stop.
0513-044 The IBM.ServiceRM Subsystem was requested to stop.
0513-044 The IBM.ERRM Subsystem was requested to stop.
0513-044 The IBM.AuditRM Subsystem was requested to stop.

and rsct¹ subsystems

CLONE# /usr/sbin/rsct/bin/rmcctrl -k
0513-044 The ctrmc Subsystem was requested to stop.
CLONE# /usr/sbin/rsct/bin/rmcctrl -z
CLONE# /usr/sbin/rsct/bin/rmcctrl -A
CLONE# /usr/sbin/rsct/bin/rmcctrl -p
CLONE# stopsrc -g rsct
0513-044 The ctrmc Subsystem was requested to stop.

Now we are ready to reset the Node ID. Starting with CAA the Node ID is stored in the ODM:

CLONE# lsattr -El cluster0 -a node_uuid
node_uuid e4c0aac8-31dd-11e5-a8dd-1e25b103930a OS image identifier                True

This is the reason why the old method does not work anymore. As long as the node ID is set in the ODM the recfgct command will always re-install the same node ID to /etc/ct_node_id. So we need to reset the node ID in the ODM:

CLONE# chdev -l cluster0 -a node_uuid=00000000-0000-0000-0000-000000000000
cluster0 changed

Now we can create a new node ID and restart the rsct_rm and rsct daemons:

CLONE# /usr/sbin/rsct/bin/mknodeid -f
CLONE# /usr/sbin/rsct/install/bin/recfgct
0513-071 The ctcas Subsystem has been added.
0513-071 The ctrmc Subsystem has been added.
0513-059 The ctrmc Subsystem has been started. Subsystem PID is 589920.

We check:

CLONE# lsattr -El cluster0 -a node_uuid
node_uuid b6e59ba2-008d-11e6-8430-e687f6863811 OS image identifier                True

and

CLONE# /usr/sbin/rsct/bin/lsnodeid
32d57d25f60b29f7

and

CLONE# cat /etc/ct_node_id
32d57d25f60b29f7

# The first line of this file contains the RSCT node id of this
# machine.  Please do not delete or modify it.

and see that the ID has indeed changed.

¹ Probably running either rmcctrl -k or stopsrc -g rsct is enough here.

Tip: Duplicate Node ID

Contents

1. Duplicate Node ID after LPAR Cloning

2. How to Change the Node ID