Date: Tue, 16 Dec 1997 09:22:07 +0000 (GMT) From: Doug Rabson <dfr@nlsystems.com> To: Bill Paul <wpaul@skynet.ctr.columbia.edu> Cc: current@freebsd.org, toor@dyson.iquest.net, dyson@freebsd.org Subject: Re: mmap() + NFS problems persist Message-ID: <Pine.BSF.3.95q.971216091544.755A-100000@herring.nlsystems.com> In-Reply-To: <199712141717.MAA13140@skynet.ctr.columbia.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 14 Dec 1997, Bill Paul wrote: > Of all the gin joints in all the towns in all the world, Doug Rabson had > to walk into mine and say: > > [...] > > > I think I understand what might be happening. I can't easily check since > > my FreeBSD hacking box is at work though. What I think happens is that > > when brelse is called in this code fragment, > > > > > if (not_readin && n > 0) { > > > if (on < bp->b_validoff || (on + n) > > > > bp->b_validend) { > > > bp->b_flags |= B_NOCACHE; > > > bp->b_flags |= B_INVAFTERWRITE; > > > if (bp->b_dirtyend > 0) { > > > if ((bp->b_flags & B_DELWRI) == 0) > > > panic("nfsbioread"); > > > if (VOP_BWRITE(bp) == EINTR) > > > return (EINTR); > > > } else > > > brelse(bp); > > > goto again; <----- LOOPS HERE!! > > > > the 8k buffer has exactly one VM page associated with it. > > Err... with all due respect, that means it's really a 4K buffer, not > an 8K buffer, yes? If so, assuming the NFS layer did an 8K read, where > did the other 4K go? What I meant was that the buffer is 8k but the VM system had one valid page in that region of the file (the first one). The buffer will have a full 8k of memory backing it but the b_validoff, b_validend members will be set to 0,4k to indicate that the first 4k of the buf contains valid data and the rest must be read from the file. > > > The NFS code is > > attempting to throw the buffer away since it is only partially valid and > > it wants to read from the invalid section of the buf. It does this by > > setting the B_NOCACHE flag before calling brelse. Unfortunately the > > underlying VM page is still valid, so when getblk is called, the code in > > allocbuf which tracks down the underlying VM pages carefully resets > > b_validoff and b_validend causing the loop. > > > > Basically, the VMIO system has managed to ignore the NFS code's request > > for a cache flush, which the NFS code relied on to break the loop in > > nfs_bioread. As I see it, the problem can be fixed in two ways. The > > first would be for brelse() on a B_NOCACHE buffer to invalidate the VM > > pages in the buffer, restoring the old behaviour which NFS expected and > > the second would be to rewrite that section of the NFS client to cope > > differently with partially valid buffers. > > Hmmm... I think I see the code in vfs_bio.c:brelse() that has lead to > this, but the comments seem to indicate that reverting it would be a bug. > > There's a couple things I don't understand. You seem to indicate that > setting the B_NOCACHE flag will cause brelse() to flush the cache, but > I didn't know brelse() did that. Also, bear in mind that the first > 4K block that's in core is now dirty, so having brelse() throw it away > would be wrong, unless it forced the first 4K block to be written out > first, but again I don't see where that happens (unless brelse() just > sets up the block to be written out and getblk() actually does it). I think (not sure) that the first page in the buf is not actually dirty, but contains clean, valid contents. The B_NOCACHE flag to brelse is supposed to flush this buf from the buffer cache (leastways thats what I thought). > > What is the correct action here? Should the dirty page be written out > first, then the buffer invalidated and the next 4K page read in? Or > should we write the dirty page but keep the buffer around and load the > next 4K page into another buffer? Or should both pages be combined into > a single 8K block? Should we not even bother to write the dirty page > out yet and just make sure the next 4K block is loaded correctly? Actually I think the correct action is for nfs_bioread to cope better with partially valid bufs. I do think, however, that the VMIO system is acting differently from the old buf system with respect to this flag. This is almost certainly due to some other work which I did trying to get NFS mmap to work properly earlier this year. > > (And is this stuff related to the other problem where the process can > become stuck sleeping on 'vmopar?') I can't say because I haven't been able to see a stack trace for this one. > > I think I'm going to take a trip back to campus today so I can experiment > a bit more with the test box. (It's not like I have anything else to do > today.) Good luck with your investigations. -- Doug Rabson Mail: dfr@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 951 1891 Fax: +44 181 381 1039
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95q.971216091544.755A-100000>