From owner-freebsd-hackers  Wed Dec 16 16:40:14 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id QAA10317
          for freebsd-hackers-outgoing; Wed, 16 Dec 1998 16:40:14 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from bright.fx.genx.net (bright.fx.genx.net [206.64.4.154])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA10307
          for <hackers@freebsd.org>; Wed, 16 Dec 1998 16:40:09 -0800 (PST)
          (envelope-from bright@hotjobs.com)
Received: from localhost (bright@localhost)
	by bright.fx.genx.net (8.9.1/8.9.1) with ESMTP id TAA00393
	for <hackers@freebsd.org>; Wed, 16 Dec 1998 19:43:17 -0500 (EST)
	(envelope-from bright@hotjobs.com)
X-Authentication-Warning: bright.fx.genx.net: bright owned process doing -bs
Date: Wed, 16 Dec 1998 19:43:17 -0500 (EST)
From: Alfred Perlstein <bright@hotjobs.com>
X-Sender: bright@bright.fx.genx.net
To: hackers@FreeBSD.ORG
Subject: NFS hangs, old problem revisited.
Message-ID: <Pine.BSF.4.05.9812161930170.377-100000@bright.fx.genx.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Anyone want to take a look at this?

I kinda think i just got bitten by it, but i have no idea.

It's my old "deleting mail in pine over NFS killed my box bug"

You can still ping the box after the hang i just got, and you can telnet
to open ports, however all that happens is that the connection is opened,
but nothing ever gets sent across.

ie:
% telnet box
Trying x.x.x.x...
Connected to x.x.x.
Escape character is '^]'.
                  
then nothing.

I'd submit a PR, however i've already done so, i tried enabling crashdumps
after being told it was 'ok' and i lost my /usr.

Can i do anything to give better feedback?

i have intr mounts, which is why i thought of this, there really is no PR
with this dialog and i didn't see any followups about it.

3.0 box as of Nov 30th.  I think i will cvsup, perhaps something somewhere
else has been done to fix this, but the code looks the same as in this
mail.

thanks,
-Alfred

----- begin conversation with people that understand vfs ------

NFS/FS people care to comment?

(Regarding the looping 'tsleep' in vfs_subr.c: vinvalbuf() which
causes a system hang). 

To reiterate a bit, the code in question is:
                while (vp->v_numoutput) {
                        vp->v_flag |= VBWAIT;
                        tsleep((caddr_t)&vp->v_numoutput,
                                slpflag | (PRIBIO + 1),
                                "vinvlbuf", slptimeo);
                }

When the filesystem is NFS mounted with the 'intr' flag, this tsleep
gets interrupted occasionally, and the system begins infinitely
looping here.

The discussion about which we need comments:

Lo and Behold, Mike Hibler said:
> > From: David G Andersen <danderse@cs>
> 
> > I can see a few options for the way to go, but I'm not sure which is
> > right.
> > 
> > 1 - return EINTR on the close ('man close' says that's a possible
error
> >     code)
> > 
> > 2 - retry the flush a few times, then return EINTR.
> >     (more likely to make clients happy)
> > 
> > 3 - For those of us who are lazy bastards, ignore SIGINTR during
> >     NFS flushes.  This seems like a bad idea.
> > 
> > 4 - Something else?  
> > 
> 
> There are really two issues involved.  One is whether the FreeBSD change
> to vinvalbuf is even necessary/correct...  Ok, I just did a cvs annotate
> and found what the change was:
> ==================
> 
> revision 1.156
> date: 1998/06/10 22:02:14;  author: julian;  state: Exp;  lines: +4 -2
> Replace 'sleep()' with 'tsleep()'
> Accidentally imported from Kirk's codebase.
> 
> Pointed out by: various.
> ----------------------------
> revision 1.155
> date: 1998/06/10 18:13:19;  author: julian;  state: Exp;  lines: +18 -8
> Submitted by: Kirk McKusick <mckusick@McKusick.COM>
> 
> Fix for potential hang when trying to reboot the system or
> to forcibly unmount a soft update enabled filesystem.
> FreeBSD already handled the reboot case differently, this is however a
better
> fix.
> 
> ==================
> So as 1.155 indicates, this change came directly from The Source so I
believe
> it is necessary.  The change in 1.156 is the key: by changing from the
4.4bsd
> non-interruptible "sleep" to the possibly interruptible "tsleep" and
OR'ing
> in the "slpflag" the problem was introduced--now the sleep became
> interruptible when called on an interruptible NFS mount.
> 
> That brings us to issue #2 which is what is the correct behavior in this
case?
> The easy way out is to just not OR in slpflag and go back to full-time
non-
> interruptibility (your #3).  However, that probably isn't necessary.
I'm a
> bettin' that you could just slpx() and return the tsleep value (your #1)
> and all will be fine. (well, as fine as it ever is in the NFS world...)

  Thanks in advance.

    -Dave

--
work: danderse@cs.utah.edu                     me:  angio@pobox.com
      University of Utah                            http://www.angio.net/
      Department of Computer Science


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message