Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Oct 1999 00:42:12 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        freebsd-current@freebsd.org
Subject:   freefall hangs w/ nfs
Message-ID:  <199910240742.AAA17268@apollo.backplane.com>
References:   <Pine.BSF.4.10.9910231421420.4943-100000@current1.whistle.com>

next in thread | previous in thread | raw e-mail | index | archive | help
    It looks on the face of it that AMD is hanging.  Perhaps this is 
    preventing the system from clearing out buffers and causing lockups
    on other mounts.  AMD could also be causing a deadlock to occur in the
    buffer cache (for the same reason loopback mounts can cause deadlocks).

    The next time this happens, if the person rebooting freefall can get 
    a kernel dump (and have a corresponding debug kernel) I may be able to 
    track it down for sure.  Fixing it is another problem, though.  Loopback
    deadlocks are a big problem under 3.x.

    Essentially what occurs under 3.x is that the buffer cache runs out of
    buffers (or buffer space) during a client op and tries to synchronously
    flush unrelated dirty buffers to clear out some room.  It may flush a
    write of a client side buffer which runs an rpc to an nfsd running on the
    same machine (i.e. via a loopback mount) which then turns around and tries
    to allocate a new buffer to issue it's filesystem write, which may in turn
    also run out of buffers or buffer space and attempt to flush another 
    unrelated dirty buffer which could be another client-side buffer.  But at
    that point nfsd is locked up in getnewbuf(), so the result is a deadlock
    that locks up the NFS node entirely (and might NOT lockup the rest of
    the machine).

    Under 3.x this is a big problem due to the synchronous flush recursion
    in getnewbuf().  Under 4.x this is not as big a problem because flushing
    is asynchronized by the buf_daemon.

    I've been trying to find a solution to the problem for 3.x.  I have a 
    few ideas.  I think we can add a flag to the mount structure that 
    getnewbuf() would set when synchronously flushing a buffer.  The flag 
    would prevent another getnewbuf() call (say one called from nfsd) from 
    trying to flush buffers from the same client mount, preventing a deadlock.
    I have to setup a 3.x box and reproduce the deadlock before I can test
    the fix, though, and that will take a bit of time.

						-Matt

Oct 15 06:18:08 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 15 06:44:49 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 15 16:29:50 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 15 16:37:26 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 15 22:46:08 freefall shutdown: reboot by jdp: Rebooting to unstick NFS 

Oct 21 03:10:15 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 21 03:34:24 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 21 04:38:39 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 21 04:46:56 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 21 11:56:01 freefall shutdown: reboot by jdp: Rebooting to clear filesystem related hangs 

Oct 22 04:23:41 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 22 04:56:57 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 22 16:40:55 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 22 17:52:34 freefall /kernel: nfs server pid173@freefall:/host: not responding

Oct 23 00:36:56 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 23 02:45:57 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 23 04:16:57 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 23 04:46:56 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 23 14:44:22 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 23 14:51:53 freefall /kernel: nfs server pid173@freefall:/host: not responding
Oct 23 15:35:55 freefall amd[24839]: /host: mount (amfs_auto_cont): Stale NFS file handle
Oct 23 15:35:55 freefall /kernel: nfs server pid173@freefall:/host: is alive again
Oct 23 15:38:40 freefall amd[25003]: /host: mount (amfs_auto_cont): Stale NFS file handle
Oct 23 15:44:05 freefall shutdown: reboot by unfurl: 




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199910240742.AAA17268>