The other day, the NFS clients at the pound stopped working correctly. Programs that use a NFS share for caching data or locking files (such as Firefox) stopped working without any explanation. My doggs were also unable to compile any programs, which led to a lot of barking and growling from all of them!
Looking through the logs on the client didn’t reveal anything significant, however the logs on the NFS server were filled with these:
kernel: statd: server localhost not responding, timed out kernel: lockd: cannot monitor client
At first it seemed that the statd daemon was not functioning. After restarting lockd and statd, the problem persisted. Even restarting the server didn’t fix the problem. The next thought was that something was blocking the loopback interface from communicating, since the localhost server wasn’t responding. After running some network tests, checking firewall and tcpwrapper rules, I found nothing that was keeping the server from communicating with itself.
After reading through the man page for statd and conversing with some of my doggs, I decided to attempt to remove the statd monitor and notify lists on the NFS server. This was the key! These files had somehow become locked or corrupted. These lists are located in the directories below:
/var/lib/nfs/statd/sm/ - directory containing statd monitor list /var/lib/nfs/statd/sm.bak/ - directory containing statd notify list
Before removing these files, you should stop the rpcbind, statd, and lockd services. Below is a list of commands to run to fix this issue on a RPM based distro.
service rpcbind stop service nfslock stop rm -rf /var/lib/nfs/statd/sm/* rm -rf /var/lib/nfs/statd/sm.bak/* service rpcbind start service nfslock start
After running these commands, it may be best to restart your NFS server.
Also check the permissions on these files and folders, to make sure that the NFS service can access them. Here are the permissions from my NFS server:
drwx------ 4 rpcuser rpcuser 4.0K Aug 1 15:00 . drwxr-xr-x 5 root root 4.0K Aug 1 15:00 .. drwx------ 2 rpcuser rpcuser 4.0K Aug 1 15:00 sm drwx------ 2 rpcuser rpcuser 4.0K Aug 1 15:00 sm.bak -rw-r--r-- 1 root root 4 Aug 1 15:00 state
A NFS FAQ can also be found here: http://www.sunhelp.org/faq/nfs.html
thanks, it was very helpful. Cheers!
Thanks for your text! I had a problem to r/w-open a shared file wich had the same permissions like a r/w-able other file?! Since the last modify of the file the ip address of the client had changed, and the old address was cached in the sm-dir. Greetings, Enrico
Glad it helped!
A DEEP heartfelt thank you! You saved us from our puzzled state after upgrading a server from openSUSE 11.3 to 12.3, and then facing a ripple of not anticipated consequences of the implicit upgrade of NFS from v3 to v4. The flurry of lockd messages led us to this post (due thanks to google, too).
P