Date: Tue, 16 Dec 1997 09:22:07 +0000 (GMT) From: Doug Rabson <dfr@nlsystems.com> To: Bill Paul <wpaul@skynet.ctr.columbia.edu> Cc: current@freebsd.org, toor@dyson.iquest.net, dyson@freebsd.org Subject: Re: mmap() + NFS problems persist Message-ID: <Pine.BSF.3.95q.971216091544.755A-100000@herring.nlsystems.com> In-Reply-To: <199712141717.MAA13140@skynet.ctr.columbia.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 14 Dec 1997, Bill Paul wrote:
> Of all the gin joints in all the towns in all the world, Doug Rabson had
> to walk into mine and say:
>
> [...]
>
> > I think I understand what might be happening. I can't easily check since
> > my FreeBSD hacking box is at work though. What I think happens is that
> > when brelse is called in this code fragment,
> >
> > > if (not_readin && n > 0) {
> > > if (on < bp->b_validoff || (on + n) >
> > > bp->b_validend) {
> > > bp->b_flags |= B_NOCACHE;
> > > bp->b_flags |= B_INVAFTERWRITE;
> > > if (bp->b_dirtyend > 0) {
> > > if ((bp->b_flags & B_DELWRI) == 0)
> > > panic("nfsbioread");
> > > if (VOP_BWRITE(bp) == EINTR)
> > > return (EINTR);
> > > } else
> > > brelse(bp);
> > > goto again; <----- LOOPS HERE!!
> >
> > the 8k buffer has exactly one VM page associated with it.
>
> Err... with all due respect, that means it's really a 4K buffer, not
> an 8K buffer, yes? If so, assuming the NFS layer did an 8K read, where
> did the other 4K go?
What I meant was that the buffer is 8k but the VM system had one valid
page in that region of the file (the first one). The buffer will have a
full 8k of memory backing it but the b_validoff, b_validend members will
be set to 0,4k to indicate that the first 4k of the buf contains valid
data and the rest must be read from the file.
>
> > The NFS code is
> > attempting to throw the buffer away since it is only partially valid and
> > it wants to read from the invalid section of the buf. It does this by
> > setting the B_NOCACHE flag before calling brelse. Unfortunately the
> > underlying VM page is still valid, so when getblk is called, the code in
> > allocbuf which tracks down the underlying VM pages carefully resets
> > b_validoff and b_validend causing the loop.
> >
> > Basically, the VMIO system has managed to ignore the NFS code's request
> > for a cache flush, which the NFS code relied on to break the loop in
> > nfs_bioread. As I see it, the problem can be fixed in two ways. The
> > first would be for brelse() on a B_NOCACHE buffer to invalidate the VM
> > pages in the buffer, restoring the old behaviour which NFS expected and
> > the second would be to rewrite that section of the NFS client to cope
> > differently with partially valid buffers.
>
> Hmmm... I think I see the code in vfs_bio.c:brelse() that has lead to
> this, but the comments seem to indicate that reverting it would be a bug.
>
> There's a couple things I don't understand. You seem to indicate that
> setting the B_NOCACHE flag will cause brelse() to flush the cache, but
> I didn't know brelse() did that. Also, bear in mind that the first
> 4K block that's in core is now dirty, so having brelse() throw it away
> would be wrong, unless it forced the first 4K block to be written out
> first, but again I don't see where that happens (unless brelse() just
> sets up the block to be written out and getblk() actually does it).
I think (not sure) that the first page in the buf is not actually dirty,
but contains clean, valid contents. The B_NOCACHE flag to brelse is
supposed to flush this buf from the buffer cache (leastways thats what I
thought).
>
> What is the correct action here? Should the dirty page be written out
> first, then the buffer invalidated and the next 4K page read in? Or
> should we write the dirty page but keep the buffer around and load the
> next 4K page into another buffer? Or should both pages be combined into
> a single 8K block? Should we not even bother to write the dirty page
> out yet and just make sure the next 4K block is loaded correctly?
Actually I think the correct action is for nfs_bioread to cope better with
partially valid bufs. I do think, however, that the VMIO system is acting
differently from the old buf system with respect to this flag. This is
almost certainly due to some other work which I did trying to get NFS mmap
to work properly earlier this year.
>
> (And is this stuff related to the other problem where the process can
> become stuck sleeping on 'vmopar?')
I can't say because I haven't been able to see a stack trace for this one.
>
> I think I'm going to take a trip back to campus today so I can experiment
> a bit more with the test box. (It's not like I have anything else to do
> today.)
Good luck with your investigations.
--
Doug Rabson Mail: dfr@nlsystems.com
Nonlinear Systems Ltd. Phone: +44 181 951 1891
Fax: +44 181 381 1039
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95q.971216091544.755A-100000>
