Date: Thu, 18 Nov 2010 07:49:41 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Oliver Fromme <olli@lurza.secnetix.de> Cc: freebsd-fs@FreeBSD.ORG Subject: Re: NFS hangs (7.3) Message-ID: <230979963.266261.1290084581845.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201011171705.oAHH5age003849@lurza.secnetix.de>
next in thread | previous in thread | raw e-mail | index | archive | help
> I've got a problem on a server farm. Every now and then, > some NFS mounts hang. This happens after a few days or > after a few weeks. All processes trying to access files > from the hanging mount go to state "D" and freeze. The > only way to resolve the problem is to reboot the server. > > "umount -f" als hangs and does not remove the hanging > mount (even though it disappears from the output of the > mount(8) command). > > Here's one example from an attempt to run df(1) which > also hangs: > > ps -uww: > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > root 61930 0.0 0.0 5728 1280 p4- D 5:15PM 0:00.01 /bin/df > > ps -lww: > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 61930 1 0 -4 0 5728 1280 nfs D p4- 0:00.01 /bin/df > It would appear that the root vnode for the client mount point is locked for some reason. Here are a couple of possible explanations: 1 - An infrequently executed code path doesn't VOP_UNLOCK()/vput() as it should. This seems relatively unlikely, since others are using the client without difficulties, but it might be an error case that only shows up for your environment. 2 - Another thread is holding the lock while stuck waiting for something else. The most obvious "something else" would be an RPC reply from the server. (A locking deadlock as mentioned below w.r.t. the spawning of new nfsiod threads, could be another?) I'd suggest a "ps axHl" when this happens, and then look for a thread that is waiting for an RPC reply. I'd also suggest "nfsstat -c" done several times over a few minutes, to see if any of the counts is changing. Also, you can do "tcpdump -w xxx -s 0 host <nfs-server>" on the client for a while and then look at "xxx" in wireshark (it knows NFS packets) and see if there is any net traffic to/from the server. (This will tell you if it is a problem related to an RPC that is in progress vs something else.) It will also tell you if it is using TCP (or you can "netstat -a" to see if TCP connections are there for the NFS mounts). > > The machine is quite busy. The hangs seem to always occur > in the night when lots of cron jobs are running. The machine > has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails, > if that matters. All NFS shares are mounted from a virtual > filer running on a NetApp filer. The mounts use the default > settings, so they should be v3 TCP (this is the default, > right?). The only extra option we use is -L in order to > "fake" locking locally. > > The machine is running FreeBSD 7.3-PRERELEASE-20100311 amd64. > Updating is somewhat complicated in that server farm, so I > haven't tried that so far because I'm not sure if it would > help. > I've only been working with 8/current, so I can't recall if there have been any client fixes for 7 since then, except there was a very recent change w.r.t. spawning of nfsiod threads to avoid lor (potential deadlocks) related to creating new kernel threads. I have no idea if one of these deadlocks might be involved. (Someone familiar with that might be able to comment?) rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?230979963.266261.1290084581845.JavaMail.root>