Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Dec 1997 09:22:07 +0000 (GMT)
From:      Doug Rabson <dfr@nlsystems.com>
To:        Bill Paul <wpaul@skynet.ctr.columbia.edu>
Cc:        current@freebsd.org, toor@dyson.iquest.net, dyson@freebsd.org
Subject:   Re: mmap() + NFS problems persist
Message-ID:  <Pine.BSF.3.95q.971216091544.755A-100000@herring.nlsystems.com>
In-Reply-To: <199712141717.MAA13140@skynet.ctr.columbia.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 14 Dec 1997, Bill Paul wrote:

> Of all the gin joints in all the towns in all the world, Doug Rabson had 
> to walk into mine and say:
> 
> [...]
>  
> > I think I understand what might be happening.  I can't easily check since
> > my FreeBSD hacking box is at work though.  What I think happens is that
> > when brelse is called in this code fragment,
> > 
> > >                 if (not_readin && n > 0) {
> > >                         if (on < bp->b_validoff || (on + n) > 
> > > bp->b_validend) {
> > >                                 bp->b_flags |= B_NOCACHE;
> > >                                 bp->b_flags |= B_INVAFTERWRITE;
> > >                                 if (bp->b_dirtyend > 0) {
> > >                                     if ((bp->b_flags & B_DELWRI) == 0)
> > >                                         panic("nfsbioread");
> > >                                     if (VOP_BWRITE(bp) == EINTR)
> > >                                         return (EINTR);
> > >                                 } else
> > >                                     brelse(bp);
> > >                                 goto again;  <----- LOOPS HERE!!
> > 
> > the 8k buffer has exactly one VM page associated with it.
> 
> Err... with all due respect, that means it's really a 4K buffer, not
> an 8K buffer, yes? If so, assuming the NFS layer did an 8K read, where
> did the other 4K go?

What I meant was that the buffer is 8k but the VM system had one valid
page in that region of the file (the first one).  The buffer will have a
full 8k of memory backing it but the b_validoff, b_validend members will
be set to 0,4k to indicate that the first 4k of the buf contains valid
data and the rest must be read from the file.

> 
> > The NFS code is
> > attempting to throw the buffer away since it is only partially valid and
> > it wants to read from the invalid section of the buf.  It does this by
> > setting the B_NOCACHE flag before calling brelse.  Unfortunately the
> > underlying VM page is still valid, so when getblk is called, the code in
> > allocbuf which tracks down the underlying VM pages carefully resets
> > b_validoff and b_validend causing the loop.
> > 
> > Basically, the VMIO system has managed to ignore the NFS code's request
> > for a cache flush, which the NFS code relied on to break the loop in
> > nfs_bioread.  As I see it, the problem can be fixed in two ways.  The
> > first would be for brelse() on a B_NOCACHE buffer to invalidate the VM
> > pages in the buffer, restoring the old behaviour which NFS expected and
> > the second would be to rewrite that section of the NFS client to cope
> > differently with partially valid buffers.
> 
> Hmmm... I think I see the code in vfs_bio.c:brelse() that has lead to
> this, but the comments seem to indicate that reverting it would be a bug.
> 
> There's a couple things I don't understand. You seem to indicate that
> setting the B_NOCACHE flag will cause brelse() to flush the cache, but
> I didn't know brelse() did that. Also, bear in mind that the first
> 4K block that's in core is now dirty, so having brelse() throw it away
> would be wrong, unless it forced the first 4K block to be written out
> first, but again I don't see where that happens (unless brelse() just
> sets up the block to be written out and getblk() actually does it).

I think (not sure) that the first page in the buf is not actually dirty,
but contains clean, valid contents.  The B_NOCACHE flag to brelse is
supposed to flush this buf from the buffer cache (leastways thats what I
thought).

> 
> What is the correct action here? Should the dirty page be written out
> first, then the buffer invalidated and the next 4K page read in? Or
> should we write the dirty page but keep the buffer around and load the
> next 4K page into another buffer? Or should both pages be combined into
> a single 8K block? Should we not even bother to write the dirty page
> out yet and just make sure the next 4K block is loaded correctly?

Actually I think the correct action is for nfs_bioread to cope better with
partially valid bufs.  I do think, however, that the VMIO system is acting
differently from the old buf system with respect to this flag.  This is
almost certainly due to some other work which I did trying to get NFS mmap
to work properly earlier this year.

> 
> (And is this stuff related to the other problem where the process can
> become stuck sleeping on 'vmopar?')

I can't say because I haven't been able to see a stack trace for this one.

> 
> I think I'm going to take a trip back to campus today so I can experiment
> a bit more with the test box. (It's not like I have anything else to do
> today.)

Good luck with your investigations.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 951 1891
					Fax:   +44 181 381 1039




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95q.971216091544.755A-100000>