Date: Fri, 13 Nov 1998 15:37:58 -0700 (MST) From: "David G. Andersen" <danderse@cs.utah.edu> To: freebsd-hackers@FreeBSD.ORG Cc: mike@fast.cs.utah.edu, sclawson@cs.utah.edu, danderse@cs.utah.edu Subject: amd/NFS INTR hang - more details. Message-ID: <13900.45689.285484.668273@torrey.cs.utah.edu>
next in thread | raw e-mail | index | archive | help
On the topic of the earlier mentioned hang we've been tracking down; 3.0-CURRENT, on dual pII-350 machines. We're running an older version of amd, but the problem occurs with the new version more frequently. We can reliably hang the machine by: open a file over NFS write some data to the file (how much appears irrelevant) close the file descriptor while doing so, ctrl-C (SIGINTR) the process which is closing the file descriptor. (This likely explains the prevalence of hangs in Netscape and Xemacs, both of which use quite a few signals. We've replicated it on a machine running nothing but the bare essentials, and nfsiod) The kernel still responds to pings and such, but no userland executes after the hang. If we force the kernel to panic, and examine the crashdump, we find that it's hung in a tsleep call, in vinvalbuf (sys/kern/vfs_subr.c) while (vp->v_numoutput) { vp->v_flag |= VBWAIT; => tsleep((caddr_t)&vp->v_numoutput, slpflag | (PRIBIO + 1), "vinvlbuf", slptimeo); } Looking at it, it appears that: (Quoting shamelessly from Mike Hibler who peeked at it also) The test program is stuck in this loop in vinvalbuf because there is a SIGINTR pending. This causes tsleep to return immediately (without sleeping) with the return value EINTR or ERESTART but they aren't checking the return value! Hence, it spins forever in this loop because... Meanwhile one of the pending nfsbiod's has been awakened because its reply to the write request has arrived, but it never gets to run. The other three nfsbiods are blocked because only one biod can be in the socket receive at a time. And until the biods return, v_numoutput won't be decremented. It works with no nfsbiods because the test program does all the buffer writes itself so by the time it gets to vinvalbuf, v_numoutput is 0. Unfortunately, I don't know what the right behavior is off the top of my head. This appears to be a FreeBSDism that isn't in our code or NetBSD. Any thoughts / suggested fixes would be appreciated. Interestingly, this appears to be at least slightly orthogonal to the other person reporting NFS problems whereby processes would get locked in "D" state; with the nfsiod's disabled, we're also seeing that problem, but haven't looked into it yet. -Dave -- work: danderse@cs.utah.edu me: angio@pobox.com University of Utah http://www.angio.net/ Department of Computer Science To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13900.45689.285484.668273>