Friday, July 13, 2018

NIMSH service PORT 3901 preventing SAP application from starting

                                  NIMSH service Reserved PORT 3901 and SAP

Today i am going to discuss about one of the incident which stopping SAP application from starting on AIX two node cluster.
So lets see what happened  and how we resolved this issue.

At first let me introduce about setup. It was 2 node AIX 7.1 HACMP setup,which was hosting SAP application. We got 3 hrs of downtime to do reboot of these 2 node cluster,so when down time started application team stopped SAP application and informed us to proceed for AIX HACMP node reboot.Then we took configuration backup as per standard checklist before doing reboot of these server. Generally we having script which executed from remote server to take backup of OS configuration and also bring resource group offline and then we rebooted AIX HACMP node. Node is up and active in 10 minutes then we started cluster services and bring up resource group on rebooted node and done POST check after OS reboot, all was looking fine from OS side like all File system are in mounted state ,also checked ifconfig -a output, it was also correct.

We informed SAP application team to start their application but after 5-10 minutes we got reply that they not able to start their application,while starting application it throwing some error, application team try 2-3 time but result was same. Then somebody from application team suggested that let clear some SAP  processes which are occupying shared memory segment. Multiple processes from shared memory segment has been identified and cleared using command ipcs and ipcrm, also some processes from OS side killed and SAP application team retried for application restart again,but no success  :(.

what Next .................??

So from management it was decided that go for second node and try to start SAP application on 2nd node . As per their request we done resource group switch to second node and asked SAP team to start their application, but this time also no success. From application and OS end  cleared shared memory for SAP processes and killed some SAP application process from OS end but no success yet............

It was almost 2 hr 30 minutes and down time about to finish. In next 30 minutes application must be online and active to start business again . So i asked one of the guy from SAP application team ,please share the error. He shared some output but that not making any sense that, why exactly application was not starting and who was culprit. But suddenly in our conversation chat one of expert from SAP checked some log file and asked us to "check whether PORT 3901 is free or not ".
finding in log file is like below ....

Error details: "PORT 3901 in use and because this  socket BIND failed"

So we checked port availability using following command and found that 0n 3901 PORT  nimsh service  is listening.

#netstat -Aan |grep -i 3901
tcp        0      0  *.3901                 *.*                    LISTEN 
also we then find process details which are listening on this port by using lsof command.
#lsof  -i tcp :3901
command  PID    user
nimsh   6115400 root         0t0  TCP *:nimsh (LISTEN) 

We found that "nimsh" process started by root user, listening on PORT 3901,until we stop this process SAP application will not going to start.

we stopped this service by #stopsrc -s nimsh and asked SAP application team to start their application by clearing some shared memory segment for SAP processes and guess what ,we succeeded in starting SAP application  😊😊😊.
Finally we able to start SAP application before down time end.

Now SAP Team was asking OS team for RCA why "nimsh" was acquiring PORT "3901" before SAP application process ??

our answer to them was ,from analysis we found that PORT 3901 is reserved for nimsh service on AIX operating system and it is not recommended that it is used by any application or database process . We found following stanza in /etc/services of AIX node and which was indicating that this PORT is reserved for nimsh service.

#root : cat /etc/services | grep -i 3901
nimsh                    3901/tcp               # NIM Service Handler
nimsh                    3901/udp               # NIM Service Handler

Also found observation that in past while doing SAP application installation/configuration by SAP admin who didn't know that 3901 is reserved for "nimsh" service and he used that PORT to configure  SAP application. So our suggestion to SAP team was use other PORT than OS reserved PORT(like 3901) for SAP application process because of following reason :

1. In future if OS team need to start nimsh service then it definitely require PORT 3901 .

2.Whenever AIX admin reboot any node of this cluster and if SAP admin want to start application  on that node after reboot,then SAP application will not start again,because PORT 3901 will be acquired by "nimsh" process by default . So for starting their application AIX admin team again need to STOP "nimsh" every time.

Found following entry in /etc/inittab file which start nimsh service during system boot time.
nimsh:2:wait:/usr/bin/startsrc -g nimclient >/dev/console 2>&1

 lssrc -g nimclient
Subsystem         Group            PID          Status

 nimsh            nimclient        3670130      active

At last i strongly suggest that it is always recommended that application or database team should not use OS reserved PORT for their application, because AIX admin who love NIM server for doing operation on NIM client will always require "nimsh" active on AIX client node.

In our case we solved this issue by finding some workaround,but this workaround is not recommended by IBM also.

Thanks !!!