Thursday, September 13, 2018

NFS error not responding still trying error on Linux

"NFS error not responding still trying" ........ for filer from storage.
df -hT in hang sate ..........
dsmc q sched not working.........

All  above mentioned error alert we received from alert monitoring software , first didn't got what exactly wrong. because there are multiple error on LINUX server. on affected linux server we checked /etc/fstab and found that there are around 6 NFS share mounted . when we do ls and cd to this NFS share , for 2 share we found issue.

For 2 NFS share ls and cd command not succeeding. so we decided to check with storage admin .
we provided affected server IP and filer details to them and asked to check whether everything is correctly shared from their side or not ?? storage admin answered that all permission are ok and filer is correctly shared to LINUX server. we decided that reexport same filer again. after reexporting , we able to access NFS share and df -hT command also working.

But this joy of solving issue was not permanent . again same issue occurred on this linux server.
So whats next..........

Now we thinking that there must some issue at network level which causing  this and also in linux server log file "messages" we found following entry,

18:22:04 linuxclient1 kernel:  nfs: server not responding, still trying
Sep  6 18:22:42 linuxclient1 kernel: nfs: server not responding, still trying
Sep  6 18:23:22 linuxclient1 kernel: nfs: server OK
Sep  6 18:23:22 linuxclient1 kernel: nfs: server OK

Sep  6 18:23:22 linuxclient1 kernel:] nfs: server OK 

From above logs ,we found that filer server is not responding to Linux server request, so there may be possibility that firewall blocking communication between NFS filer server and Linux client server.
we provided all required details to network team but after analysis found that from their side also no issue. Now pending team is VMWARE team who created this VM . but VM team also saying that all VM configuration is correct.

After their answer we did google search and found one interesting thing for this type of error and that was MTU. MTU is maximum transfer unit for ETHERNET interface on network.  incorrect MTU configuration on server causes performance issue on any Linu/UNIX and windows server. on affected Linux server MTU was 9000 . we also did search for MTU value on other server which are in same IP range and found that MTU for these server is 1500 and affected server it was 9000. we decided to take downtime for changing MTU value . we did MTU chnage to 1500 for primary interface and took reboot of Linux server and guess what after reboot everything was working perfect. df -hT , dsmc q sched and also ls ,cd command working perfectly on these NFS share.

At the end we can say that because of incorrect MTU configuration nfs share hang issue occurred and this nfs hang affect execution of df -hT ,dsmc q sched and ls ,cd command execution.

There may be multiple cause of NFS hang.

1. NFS server hang or down.
2. Firewall blocking communication between NFS server and Linux client.
3. incorrect MTU configuration on client or server side .
4. overloaded NFS server causing timeout for client request.

Thanks !!!

No comments:

Post a Comment