Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Jan 1998 00:59:37 -0500 (EST)
From:      Bill Paul <wpaul@skynet.ctr.columbia.edu>
To:        grog@lemis.com (Greg Lehey)
Cc:        current@FreeBSD.ORG, dyson@FreeBSD.ORG
Subject:   Re: Another observation on -current and NFS
Message-ID:  <199801270559.AAA07933@skynet.ctr.columbia.edu>
In-Reply-To: <19980127155557.45414@lemis.com> from "Greg Lehey" at Jan 27, 98 03:55:57 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Of all the gin joints in all the towns in all the world, Greg Lehey had 
to walk into mine and say:
 
> OK, I've build a new -current as well on my test machine.  I haven't
> had long enough to decide if there are any differences from a kernel
> built a month ago, but I've seen one similarity:
> 
> I build my kernels on an NFS-mounted file system and leave my /usr
> read-only (this speeds up reboots after panics :-).  I find I
> *consistently* can't strip a debug kernel across the net.  The
> connection hangs, the local keyboard hangs, and about the only thing
> that works is a ping from another system.

This is an indication that some kernel routine is caught in an infinite
loop of some kind. In fact, this sounds suspiciously like the problem
I had with mmap() and NFS.

> I have dumps of this situation, and I'm sure I can create more.  The
> relevant part of the stack trace looks like:

[...]
> #12 0xf0134384 in vfs_vmio_release (bp=0xf24e556c) at ../../kern/vfs_bio.c:756
> #13 0xf0134080 in brelse (bp=0xf24e556c) at ../../kern/vfs_bio.c:647
> #14 0xf0166316 in nfs_bioread (vp=0xf0508900, uio=0xf3a41e34, ioflag=8, cred=0xf0547b80, getpages=1)
>     at ../../nfs/nfs_bio.c:346
> #15 0xf0165b64 in nfs_getpages (ap=0xf3a41e70) at ../../nfs/nfs_bio.c:116
> #16 0xf01d0468 in vnode_pager_getpages (object=0xf3cced80, m=0xf3a41f38, count=1, reqpage=0) at vnode_if.h:1011
> #17 0xf01cf0b3 in vm_pager_get_pages (object=0xf3cced80, m=0xf3a41f38, count=1, reqpage=0) at ../../vm/vm_pager.c:184
> #18 0xf01c4750 in vm_fault (map=0xf0520200, vaddr=538910720, fault_type=3 '\003', fault_flags=8)
>     at ../../vm/vm_fault.c:419
[...]

Oh boy, does this look familiar. This is almost exactly what I saw with
the mmap() problem. The problem is that nfs_bioread() is trying to evict
a buffer from the buffer cache that it doesn't like for some reason, but
in fact the buffer is not being flushed. In my case, the buffer in question
was supposed to cover an 8K region, but only the first 4K was actually
mapped. When my test program tried to touch the second 4K, it triggered
a page fault, causing the system to try to load the new 4K page into memory.
But when nfs_bioread() is called, it finds the cached buffer which says
it covers the desired region but really only covers the first 4K. So
nfs_bioread() calls brelse() thinking this will evict the buffer from
the cache so that the next time through the loop it will perform an actual
read from the server to fetch the desired block.

In fact, brelse() doesn't actually free the buffer, so nfs_bioread() comes
up with the same partially valid buffer (via nfs_getcacheblk()) all over
again, and it tries to evict it again, then it gets it again, and it tries
to evict it again, then it gets it again, and it tries to evict it again,
then it gets it again, and it tries to evict it again, and so on, and so
on, and so on... nfs_bioread() seems to think that setting the B_NOCACHE
flag on the buffer is enough to make brelse() throw the buffer out of
the cache, but this doesn't seem to be the case anymore.

Since the loop carries through quite a few different functions, you're
likely to get several stack traces that look quite different, except for
nfs_bioread(). I know: I banged my head against this problem for a while
myself.

John Dyson made a couple of commits this weekend which I _thought_ would
fix this thing; I gave him a sample program that triggers the bug to help
him track it down. Assuming that he got my sample program to work, I'm
guessing there's more than one way to trigger the condition.
 
> The command in question is the strip.

It appears that strip(1) uses mmap(), so it looks like the same bug.
Either it isn't really fixed, or there's more than one way to trip it.
I would have loved to have fixed it myself, but I don't understand
enough about the VM system to do it: while I can understand the nature
of the bug, I don't actually know the right way to flush a buffer from
the cache so I have no idea how to correct it. I tried looking at the
UFS code to see if I could spot any clues, but nothing jumped out at
me.

> If anybody wants more
> information, or access to the machine to look at it more carefully,
> please contact me.  As I said before, this does not appear to have
> anything to do with changes made in the last 6 weeks.

Hmmm... well, the bug I encountered has been there for a couple months
at least. If you've only started to encounter it now, then maybe John's
attempted fix made it worse somehow. *sigh*

-Bill

PS: Based on some other traffic I've seen come through -current recently,
    I'd bet a quarter that the cd9660 filesystem has mmap() problems too.

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801270559.AAA07933>