From owner-freebsd-hackers Wed Dec 16 16:40:14 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA10317 for freebsd-hackers-outgoing; Wed, 16 Dec 1998 16:40:14 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from bright.fx.genx.net (bright.fx.genx.net [206.64.4.154]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA10307 for ; Wed, 16 Dec 1998 16:40:09 -0800 (PST) (envelope-from bright@hotjobs.com) Received: from localhost (bright@localhost) by bright.fx.genx.net (8.9.1/8.9.1) with ESMTP id TAA00393 for ; Wed, 16 Dec 1998 19:43:17 -0500 (EST) (envelope-from bright@hotjobs.com) X-Authentication-Warning: bright.fx.genx.net: bright owned process doing -bs Date: Wed, 16 Dec 1998 19:43:17 -0500 (EST) From: Alfred Perlstein X-Sender: bright@bright.fx.genx.net To: hackers@FreeBSD.ORG Subject: NFS hangs, old problem revisited. Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Anyone want to take a look at this? I kinda think i just got bitten by it, but i have no idea. It's my old "deleting mail in pine over NFS killed my box bug" You can still ping the box after the hang i just got, and you can telnet to open ports, however all that happens is that the connection is opened, but nothing ever gets sent across. ie: % telnet box Trying x.x.x.x... Connected to x.x.x. Escape character is '^]'. then nothing. I'd submit a PR, however i've already done so, i tried enabling crashdumps after being told it was 'ok' and i lost my /usr. Can i do anything to give better feedback? i have intr mounts, which is why i thought of this, there really is no PR with this dialog and i didn't see any followups about it. 3.0 box as of Nov 30th. I think i will cvsup, perhaps something somewhere else has been done to fix this, but the code looks the same as in this mail. thanks, -Alfred ----- begin conversation with people that understand vfs ------ NFS/FS people care to comment? (Regarding the looping 'tsleep' in vfs_subr.c: vinvalbuf() which causes a system hang). To reiterate a bit, the code in question is: while (vp->v_numoutput) { vp->v_flag |= VBWAIT; tsleep((caddr_t)&vp->v_numoutput, slpflag | (PRIBIO + 1), "vinvlbuf", slptimeo); } When the filesystem is NFS mounted with the 'intr' flag, this tsleep gets interrupted occasionally, and the system begins infinitely looping here. The discussion about which we need comments: Lo and Behold, Mike Hibler said: > > From: David G Andersen > > > I can see a few options for the way to go, but I'm not sure which is > > right. > > > > 1 - return EINTR on the close ('man close' says that's a possible error > > code) > > > > 2 - retry the flush a few times, then return EINTR. > > (more likely to make clients happy) > > > > 3 - For those of us who are lazy bastards, ignore SIGINTR during > > NFS flushes. This seems like a bad idea. > > > > 4 - Something else? > > > > There are really two issues involved. One is whether the FreeBSD change > to vinvalbuf is even necessary/correct... Ok, I just did a cvs annotate > and found what the change was: > ================== > > revision 1.156 > date: 1998/06/10 22:02:14; author: julian; state: Exp; lines: +4 -2 > Replace 'sleep()' with 'tsleep()' > Accidentally imported from Kirk's codebase. > > Pointed out by: various. > ---------------------------- > revision 1.155 > date: 1998/06/10 18:13:19; author: julian; state: Exp; lines: +18 -8 > Submitted by: Kirk McKusick > > Fix for potential hang when trying to reboot the system or > to forcibly unmount a soft update enabled filesystem. > FreeBSD already handled the reboot case differently, this is however a better > fix. > > ================== > So as 1.155 indicates, this change came directly from The Source so I believe > it is necessary. The change in 1.156 is the key: by changing from the 4.4bsd > non-interruptible "sleep" to the possibly interruptible "tsleep" and OR'ing > in the "slpflag" the problem was introduced--now the sleep became > interruptible when called on an interruptible NFS mount. > > That brings us to issue #2 which is what is the correct behavior in this case? > The easy way out is to just not OR in slpflag and go back to full-time non- > interruptibility (your #3). However, that probably isn't necessary. I'm a > bettin' that you could just slpx() and return the tsleep value (your #1) > and all will be fine. (well, as fine as it ever is in the NFS world...) Thanks in advance. -Dave -- work: danderse@cs.utah.edu me: angio@pobox.com University of Utah http://www.angio.net/ Department of Computer Science To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message