Date: Tue, 23 Nov 1999 02:50:38 +0000 (GMT) From: iedowse@maths.tcd.ie To: FreeBSD-gnats-submit@freebsd.org Subject: kern/15055: Soft NFS mounts can deadlock Message-ID: <199911230250.aa05526@walton.maths.tcd.ie>
next in thread | raw e-mail | index | archive | help
>Number: 15055 >Category: kern >Synopsis: Soft NFS mounts can deadlock >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Nov 22 19:00:01 PST 1999 >Closed-Date: >Last-Modified: >Originator: Ian Dowse >Release: FreeBSD 3.3-STABLE i386 >Organization: School of Mathematics, Trinity College Dublin >Environment: FreeBSD -current or -stable, mounting an NFS filesystem with the NFSMNT_SOFT (-s) flag. >Description: Under certain circumstances it is possible for multiple processes to reach a deadlock situation when accessing a soft-mount NFS filesystem. This problem is triggered when the NFS server becomes unavailable for a time, but the processes remain deadlocked even after the server comes back. If the mount is also interruptable (NFSMNT_INT or -i), then recovery is possible by killing some of the affected processes; otherwise a reboot is necessary. This problem results from an interaction between the NFS congestion window mechanism, and the way that soreceive()'s on the NFS socket are serialised. When the NFS server becomes unavailable and there are outstanding requests (new or old), the NFS congestion window quickly shrinks back to 1 RPC. Requests then fall into two catagories: (a) those that managed to get in and send a request before the window closed up (R_SENT flag set); and (b) those that missed the window, so are waiting for nfs_timer() to transmit their requests later. The deadlock occurs when a process with a category (b) request gets the receive lock, and subsequently all type (a) requests time out. No type (a) requests are transmitted since they have all timed out, and the congestion window disallows transmitting type (b) requests. The process holding the receive lock will not release it until it receives a NFS reply (for any request), but since there are no requests being transmitted, this never happens. The timed- out requests don't complete either since their processes are all waiting for the receive lock! If the mount is interruptable, then killing the type (b) process that currently holds the receive lock will release it. Then all the type (a) processes notice that their requests have timed out, and return. >How-To-Repeat: mount -o -s,-i someserver:/fs /mnt # Lots of accesses to push down the NFS RTT estimates find /mnt -print > /dev/null # *** Disconnect the server from the client *** # Make some type (a) processes ls -l /mnt &; ls -l /mnt &; ls -l /mnt &; ls -l /mnt & sleep 5 # Now that the congestion window has closed these will be type (b) df /mnt &; df /mnt &; df /mnt &; df /mnt & Then wait for a few 'nfs server not responding' errors, and wait for the NFS traffic to stop completely with one of the df processes waiting on 'sbwait'. When this happens, reconnecting the server will not unwedge the processes, but killing the df in 'sbwait' will. >Fix: Apply the following patch to sys/nfs/nfs_socket.c. This causes the count of outstanding requests to be decremented as soon as a request is marked as timed-out. When all type (a) requests have timed out, the congestion window will allow another request to be transmitted, so the deadlock is avoided. Note that while this patch solves the deadlock problem, the code still does not guarantee that a process will be made aware quickly that its request has timed out. That would require nfs_timer() to set some flag in the nfsmount struct, instructing the current holder of the receive lock to release it as soon as possible. I'm not sure that such a mechanism would be worth the effort. With this patch the process will find out eventually (it doesn't need to wait for the server to come back) about a timeout, and all waiting processes will respond quickly when the server does return. --- nfs_socket.c.orig Mon Nov 22 21:58:12 1999 +++ nfs_socket.c Mon Nov 22 22:43:33 1999 @@ -152,6 +152,7 @@ static void nfs_realign __P((struct mbuf **pm, int hsiz)); static int nfs_receive __P((struct nfsreq *rep, struct sockaddr **aname, struct mbuf **mp)); +static void nfs_softterm __P((struct nfsreq *rep)); static int nfs_reconnect __P((struct nfsreq *rep)); #ifndef NFS_NOSERVER static int nfsrv_getstream __P((struct nfssvc_sock *,int)); @@ -864,8 +865,10 @@ if (nmp->nm_cwnd > NFS_MAXCWND) nmp->nm_cwnd = NFS_MAXCWND; } - rep->r_flags &= ~R_SENT; - nmp->nm_sent -= NFS_CWNDSCALE; + if (rep->r_flags & R_SENT) { + rep->r_flags &= ~R_SENT; + nmp->nm_sent -= NFS_CWNDSCALE; + } /* * Update rtt using a gain of 0.125 on the mean * and a gain of 0.25 on the deviation. @@ -1384,7 +1387,7 @@ if (rep->r_mrep || (rep->r_flags & R_SOFTTERM)) continue; if (nfs_sigintr(nmp, rep, rep->r_procp)) { - rep->r_flags |= R_SOFTTERM; + nfs_softterm(rep); continue; } if (rep->r_rtt >= 0) { @@ -1412,7 +1415,7 @@ } if (rep->r_rexmit >= rep->r_retry) { /* too many */ nfsstats.rpctimeouts++; - rep->r_flags |= R_SOFTTERM; + nfs_softterm(rep); continue; } if (nmp->nm_sotype != SOCK_DGRAM) { @@ -1491,6 +1494,27 @@ nfs_timer_handle = timeout(nfs_timer, (void *)0, nfs_ticks); } +/* + * Flag a request as being about to terminate (due to NFSMNT_INT/NFSMNT_SOFT). + * The nm_send count is decremented now to avoid deadlocks when the process in + * soreceive() hasn't yet managed to send its own request. + */ +static void +nfs_softterm(rep) + struct nfsreq *rep; +{ + rep->r_flags |= R_SOFTTERM; + + /* + * Decrement the outstanding request count, and clear R_SENT so + * that the decrement doesn't get done again later. + */ + if (rep->r_flags & R_SENT) { + rep->r_nmp->nm_sent -= NFS_CWNDSCALE; + rep->r_flags &= ~R_SENT; + } +} + /* * Test for a termination condition pending on the process. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911230250.aa05526>