Date: Sun, 24 Oct 1999 00:42:12 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: freebsd-current@freebsd.org Subject: freefall hangs w/ nfs Message-ID: <199910240742.AAA17268@apollo.backplane.com> References: <Pine.BSF.4.10.9910231421420.4943-100000@current1.whistle.com>
next in thread | previous in thread | raw e-mail | index | archive | help
It looks on the face of it that AMD is hanging. Perhaps this is preventing the system from clearing out buffers and causing lockups on other mounts. AMD could also be causing a deadlock to occur in the buffer cache (for the same reason loopback mounts can cause deadlocks). The next time this happens, if the person rebooting freefall can get a kernel dump (and have a corresponding debug kernel) I may be able to track it down for sure. Fixing it is another problem, though. Loopback deadlocks are a big problem under 3.x. Essentially what occurs under 3.x is that the buffer cache runs out of buffers (or buffer space) during a client op and tries to synchronously flush unrelated dirty buffers to clear out some room. It may flush a write of a client side buffer which runs an rpc to an nfsd running on the same machine (i.e. via a loopback mount) which then turns around and tries to allocate a new buffer to issue it's filesystem write, which may in turn also run out of buffers or buffer space and attempt to flush another unrelated dirty buffer which could be another client-side buffer. But at that point nfsd is locked up in getnewbuf(), so the result is a deadlock that locks up the NFS node entirely (and might NOT lockup the rest of the machine). Under 3.x this is a big problem due to the synchronous flush recursion in getnewbuf(). Under 4.x this is not as big a problem because flushing is asynchronized by the buf_daemon. I've been trying to find a solution to the problem for 3.x. I have a few ideas. I think we can add a flag to the mount structure that getnewbuf() would set when synchronously flushing a buffer. The flag would prevent another getnewbuf() call (say one called from nfsd) from trying to flush buffers from the same client mount, preventing a deadlock. I have to setup a 3.x box and reproduce the deadlock before I can test the fix, though, and that will take a bit of time. -Matt Oct 15 06:18:08 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 15 06:44:49 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 15 16:29:50 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 15 16:37:26 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 15 22:46:08 freefall shutdown: reboot by jdp: Rebooting to unstick NFS Oct 21 03:10:15 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 21 03:34:24 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 21 04:38:39 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 21 04:46:56 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 21 11:56:01 freefall shutdown: reboot by jdp: Rebooting to clear filesystem related hangs Oct 22 04:23:41 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 22 04:56:57 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 22 16:40:55 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 22 17:52:34 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 00:36:56 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 02:45:57 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 04:16:57 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 04:46:56 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 14:44:22 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 14:51:53 freefall /kernel: nfs server pid173@freefall:/host: not responding Oct 23 15:35:55 freefall amd[24839]: /host: mount (amfs_auto_cont): Stale NFS file handle Oct 23 15:35:55 freefall /kernel: nfs server pid173@freefall:/host: is alive again Oct 23 15:38:40 freefall amd[25003]: /host: mount (amfs_auto_cont): Stale NFS file handle Oct 23 15:44:05 freefall shutdown: reboot by unfurl: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199910240742.AAA17268>