Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Nov 1998 15:37:58 -0700 (MST)
From:      "David G. Andersen" <danderse@cs.utah.edu>
To:        freebsd-hackers@FreeBSD.ORG
Cc:        mike@fast.cs.utah.edu, sclawson@cs.utah.edu, danderse@cs.utah.edu
Subject:   amd/NFS INTR hang - more details.
Message-ID:  <13900.45689.285484.668273@torrey.cs.utah.edu>

next in thread | raw e-mail | index | archive | help

On the topic of the earlier mentioned hang we've been tracking down;

3.0-CURRENT, on dual pII-350 machines.  We're running an older version
of amd, but the problem occurs with the new version more frequently.

We can reliably hang the machine by:

  open a file over NFS
  write some data to the file (how much appears irrelevant)
  close the file descriptor

while doing so, ctrl-C (SIGINTR) the process which is closing the file 
descriptor.

(This likely explains the prevalence of hangs in Netscape and Xemacs,
both of which use quite a few signals.  We've replicated it on a
machine running nothing but the bare essentials, and nfsiod)

The kernel still responds to pings and such, but no userland executes
after the hang.  If we force the kernel to panic, and examine the
crashdump, we find that it's hung in a tsleep call, in vinvalbuf
(sys/kern/vfs_subr.c)

                while (vp->v_numoutput) {
                        vp->v_flag |= VBWAIT;
=>                      tsleep((caddr_t)&vp->v_numoutput,
                                slpflag | (PRIBIO + 1),
                                "vinvlbuf", slptimeo);
                }

Looking at it, it appears that:  (Quoting shamelessly from Mike
Hibler who peeked at it also)

The test program is stuck in this loop in vinvalbuf because there is a
SIGINTR pending.  This causes tsleep to return immediately (without sleeping)
with the return value EINTR or ERESTART but they aren't checking the return
value!  Hence, it spins forever in this loop because...

Meanwhile one of the pending nfsbiod's has been awakened because its reply
to the write request has arrived, but it never gets to run.  The other three
nfsbiods are blocked because only one biod can be in the socket receive at a
time.  And until the biods return, v_numoutput won't be decremented.

It works with no nfsbiods because the test program does all the buffer
writes itself so by the time it gets to vinvalbuf, v_numoutput is 0.

Unfortunately, I don't know what the right behavior is off the top of
my head.  This appears to be a FreeBSDism that isn't in our code or
NetBSD.

Any thoughts / suggested fixes would be appreciated.  Interestingly,
this appears to be at least slightly orthogonal to the other person
reporting NFS problems whereby processes would get locked in "D"
state;  with the nfsiod's disabled, we're also seeing that problem,
but haven't looked into it yet.

   -Dave

-- 
work: danderse@cs.utah.edu                     me:  angio@pobox.com
      University of Utah                            http://www.angio.net/
      Department of Computer Science

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13900.45689.285484.668273>