From owner-freebsd-current Sun Dec 14 09:16:49 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id JAA20208 for current-outgoing; Sun, 14 Dec 1997 09:16:49 -0800 (PST) (envelope-from owner-freebsd-current) Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id JAA20195; Sun, 14 Dec 1997 09:16:34 -0800 (PST) (envelope-from wpaul@skynet.ctr.columbia.edu) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id MAA13140; Sun, 14 Dec 1997 12:17:38 -0500 From: Bill Paul Message-Id: <199712141717.MAA13140@skynet.ctr.columbia.edu> Subject: Re: mmap() + NFS problems persist To: dfr@nlsystems.com Date: Sun, 14 Dec 1997 12:17:37 -0500 (EST) Cc: current@freebsd.org, toor@dyson.iquest.net, dyson@freebsd.org In-Reply-To: from "Doug Rabson" at Dec 13, 97 10:23:42 am X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-freebsd-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Of all the gin joints in all the towns in all the world, Doug Rabson had to walk into mine and say: [...] > I think I understand what might be happening. I can't easily check since > my FreeBSD hacking box is at work though. What I think happens is that > when brelse is called in this code fragment, > > > if (not_readin && n > 0) { > > if (on < bp->b_validoff || (on + n) > > > bp->b_validend) { > > bp->b_flags |= B_NOCACHE; > > bp->b_flags |= B_INVAFTERWRITE; > > if (bp->b_dirtyend > 0) { > > if ((bp->b_flags & B_DELWRI) == 0) > > panic("nfsbioread"); > > if (VOP_BWRITE(bp) == EINTR) > > return (EINTR); > > } else > > brelse(bp); > > goto again; <----- LOOPS HERE!! > > the 8k buffer has exactly one VM page associated with it. Err... with all due respect, that means it's really a 4K buffer, not an 8K buffer, yes? If so, assuming the NFS layer did an 8K read, where did the other 4K go? > The NFS code is > attempting to throw the buffer away since it is only partially valid and > it wants to read from the invalid section of the buf. It does this by > setting the B_NOCACHE flag before calling brelse. Unfortunately the > underlying VM page is still valid, so when getblk is called, the code in > allocbuf which tracks down the underlying VM pages carefully resets > b_validoff and b_validend causing the loop. > > Basically, the VMIO system has managed to ignore the NFS code's request > for a cache flush, which the NFS code relied on to break the loop in > nfs_bioread. As I see it, the problem can be fixed in two ways. The > first would be for brelse() on a B_NOCACHE buffer to invalidate the VM > pages in the buffer, restoring the old behaviour which NFS expected and > the second would be to rewrite that section of the NFS client to cope > differently with partially valid buffers. Hmmm... I think I see the code in vfs_bio.c:brelse() that has lead to this, but the comments seem to indicate that reverting it would be a bug. There's a couple things I don't understand. You seem to indicate that setting the B_NOCACHE flag will cause brelse() to flush the cache, but I didn't know brelse() did that. Also, bear in mind that the first 4K block that's in core is now dirty, so having brelse() throw it away would be wrong, unless it forced the first 4K block to be written out first, but again I don't see where that happens (unless brelse() just sets up the block to be written out and getblk() actually does it). What is the correct action here? Should the dirty page be written out first, then the buffer invalidated and the next 4K page read in? Or should we write the dirty page but keep the buffer around and load the next 4K page into another buffer? Or should both pages be combined into a single 8K block? Should we not even bother to write the dirty page out yet and just make sure the next 4K block is loaded correctly? (And is this stuff related to the other problem where the process can become stuck sleeping on 'vmopar?') I think I'm going to take a trip back to campus today so I can experiment a bit more with the test box. (It's not like I have anything else to do today.) -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" =============================================================================