From owner-freebsd-hackers Sun Jul 26 22:40:21 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA14042 for freebsd-hackers-outgoing; Sun, 26 Jul 1998 22:40:21 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from antipodes.cdrom.com (castles319.castles.com [208.214.167.19]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA14036 for ; Sun, 26 Jul 1998 22:40:16 -0700 (PDT) (envelope-from mike@antipodes.cdrom.com) Received: from antipodes.cdrom.com (localhost [127.0.0.1]) by antipodes.cdrom.com (8.8.8/8.8.5) with ESMTP id WAA14548; Sun, 26 Jul 1998 22:39:05 -0700 (PDT) Message-Id: <199807270539.WAA14548@antipodes.cdrom.com> X-Mailer: exmh version 2.0zeta 7/24/97 To: Terry Lambert cc: karl@mcs.net (Karl Denninger), hackers@FreeBSD.ORG Subject: Re: Odd problem we're seeing here In-reply-to: Your message of "Sat, 14 Mar 1998 08:58:25 GMT." <199803140858.BAA17978@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 26 Jul 1998 22:39:05 -0700 From: Mike Smith Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I'm seeing an interesting problem here. > > No you're not. You are seeng a problem I already knew about, > and provided a patch for, but the patch was not checked in by > anyone (no, not the NFS locking patch; that should be checked > in, too, though). Was this issue ever resolved to your satisfaction? > The problem is in the lease code, and has been exaggerated by > recent changes that cause the paging path to be invoked for > writes, as well. Leases are the same thing as opportunistic > locks, and... "Opportunity only locks once". 8-) 8-). > > > > Assuming two writers to an NFS file from the SAME machine. > > > > Writer #1 has opened the file O_RDWR and intends to do a set of random > > operations on it. > > > > Writer #2 has opened the file O_WRONLY|O_APPEND and intends only to add > > to the end of the file. > > > > Writer #1 gets an flock on the file, and does things to the file. He > > might even do an ftruncate at some point to roll back the contents. > > > > Writer #2, in the meantime, attempts to get an flock on the file and > > (correctly) blocks, waiting for Writer #1 to finish. > > > > Writer #1 gets done, and releases his lock. > > > > Writer #2 *SOMETIMES DOES NOT WRITE TO THE CORRECT (END OF FILE) PLACE*. > > > > This is difficult to reproduce, but it can be done. It appears that the > > O_APPEND isn't causing the implied lseek(....., 0, SEEK_END) to be done > > before each write in this situation. > > This behaviour is a result of a race condition. > > When you open a file, you are saying > > "Give me the index into my open file table of the struct > file * whose f_data points to the struct vnode for the > file I am opening, and whose f_offset is set to the end > of the file *at the time the write is issued*" > > Now what does O_APPEND mean? > > "When you can vn_open on the nameidata pointer, when you > call the VOP_OPEN, pass the FAPPEND bit in the fmode > argument, where all the FS's will (incorrectly) compare > it to IO_APPEND, which will work because FAPPEND is > defined as IO_APPEND in fcntl.h" > > When nfs_write() is called (in /sys/nfs/nfs_bio.c), if the ioflag > contains IO_APPEND, then a VOP_GETATTR is issued on the vp, and > the uio->uio_offset is set to the value of n_size, which came > from the cache. > > The cache value is set from va_size from a proxied VOP_GETATTR > to the remote system, but *ONLY* if the lease has expired on its > own, or has explicitly been expired. > > What you are seeing is the result of n_attrstamp not being updated > when it should be. > > In fact, you are seeing a race between the flock( fd, LOCK_UN), and > the atime/mtime updates as a result of the operations by write #1. > > > And if you trace down this race window, you will see that what is > happening is that your write call is not calling VOP_LEASE. > > You can get the patch for the files: > > /sys/kern/kern_ktrace.c > /sys/kern/link_aout.c > /src/sys/kern/tty_tty.c > /sys/kern/vfs_vnops.c > > From: > > http://www.freebsd.org/~terry/DIFF.LEASE > > Note: I am not terribly happy about this code. I believe VOP_LEASE > should be called with the vp locked, but I don't do this in two > vn_rdwr cases because of the indeterminate vp lock state. Of course, > the existing code blows this all to hell (SMP problems with NFS, > anyone?). > > The only way to really fix this reasonable would be to get rid of > vn_rdwr, and replace it with vn_read or vn_write, as appropriate > to the situation. Luckily this is really only a problem for the > ktrace and other unlikely code (it used to be a problem for the > image activator, too, until John fixed it after I complained about > it). > > Pretty clearly vn_rdwr was a hack to avoid locking and unlocking > the vp over every call (cough, cough, lazy thinking, cough, cough). > > > > I'm trying to nail this one down. The latest patches to the NFS code appear > > to make this happen more often than it did before, but it did occasionally > > happen even before the patches. > > Writes to mapped NFS pages need to trigger the lease revocation code > as well. This basically means that there needs to be a VOP_LEASE > call for each and every VOP_PUTPAGES in to NFS. > > This should probably go in the vnode pager code before VOP_PUTPAGES > is called. Before the recent changes, paging *to* an NFS FS was a > hopeless case. Now, it can cause problems (though probably not this > one; I don't believe a putpages can be used to extend a file). The > generic routine should be used in case we ever want to export an > opportunity locking mechanism with full coherency to user space > programs. Like SAMBA. But only if we want O_APPEND to work for > SMB and NFS clients accessing the same file (and what idiot would > want a consistently functional system? Where's the adventure? 8-)). > > > > This NEVER happens on locally mounted files, but occasionally happens > > on NFS mounted files. > > Yes, well, it's a real coding error in the VFS client code; what do > you expect? ;-). > > Like I said, though, doing the patches won't fix all the races with > putpages (consider a record being written, and making the decision of > whether or not to read before write, and a record in an adjacent page > was just touched), and it won't work well under SMP unless vn_rdwr > dies (but then the existing code can't work well under SMP, since most > of the other vn_rdwr references have the same bug). > > > Regards, > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message