Tuesday 8 September 2020

IBM MQ – AMQ3817E – DRBD Errors

 Recently when rebuilding an MQ Estate, the previous deployment was torn down and a new installation and configuration was deployed to the same physical Virtual Machines. However, when it came to creating the new Queue Manager there was an error thrown.

 

The reason for the rebuild was to upgrade the version to IBM MQv9.2 which includes some changes to the way RDQM pacemaker and drbd kernel modules are installed.

 

Command:

crtmqm -rr p -rt s -rl thisIP -ri otherIP -rn otherHostname -rp port -lla -fs 20 -lp 6 -ls 3 qmname

 

Error:

AMQ3817E: Replicated data subsystem call '/usr/sbin/drbdadm new-resourceqmname 0 --auto-promote=no' failed with return code '20'.Command 'drbdsetup new-resource qmname 0 --auto-promote=no' did not terminate within 5 seconds

AMQ3812E: Failed to create replicated data queue manager configuration.

 

To debug I did the following:

1.     Checked the MQ Error logs

2.     Check there were no outstanding processes that could be blocking the crtmqm command

3.     Check the correct drbd kernel module was installed on the server. (modinfo drbd)

 

There were no errors in the MQ Logs, no processes running for drbd and the correct kernel module was installed. Sufficiently stumped I ran lsmod drbd | grep drbd on the server and compared it to another environment.

 

There was no drbd_transport_tcp module on the output of the environment with the issues only the drbd module. When there are no RDQM Queue Managers running lsmod drbd | grep drbd hould return empty, which seems to suggest the drbd module was hanging. 

 

To resolve this there are two possible actions:

a.     Reboot the machine

b.     Remove the remaining ‘hanging’ drbd module using rmmod drbd

 

After completion, run lsmod drbd | grep drbd and the result should be empty. Try to build your Queue Manager again and it should create successfully.