From owner-freebsd-hackers Sat Mar 14 09:37:54 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id JAA07648 for freebsd-hackers-outgoing; Sat, 14 Mar 1998 09:37:54 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from Kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA07638 for ; Sat, 14 Mar 1998 09:37:48 -0800 (PST) (envelope-from karl@Mars.mcs.net) Received: from Mars.mcs.net (karl@Mars.mcs.net [192.160.127.85]) by Kitten.mcs.com (8.8.7/8.8.2) with ESMTP id LAA04916; Sat, 14 Mar 1998 11:37:44 -0600 (CST) Received: (from karl@localhost) by Mars.mcs.net (8.8.7/8.8.2) id LAA22692; Sat, 14 Mar 1998 11:37:44 -0600 (CST) Message-ID: <19980314113743.62444@mcs.net> Date: Sat, 14 Mar 1998 11:37:43 -0600 From: Karl Denninger To: Terry Lambert Cc: hackers@FreeBSD.ORG Subject: Re: Odd problem we're seeing here References: <19980312223102.50360@mcs.net> <199803140858.BAA17978@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.84 In-Reply-To: <199803140858.BAA17978@usr08.primenet.com>; from Terry Lambert on Sat, Mar 14, 1998 at 08:58:25AM +0000 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hmmm... I've looked at these patches, and have a couple of questions... You note that the vnode "should" be locked when the LEASE call is made - yet in kern_ktrace.c, you call the LEASE request *before* the lock is asserted. Is this correct, and why? Wouldn't you want to lock the object first to prevent setting up another potential race condition? Also, ktrace.c isn't a "standard" execution path, is it? link_aout.c applies to loading executables, and there it appears that I can't do any harm (or help) to the current situation by applying this. In the tty_tty.c case, you're also calling the lease before the lock. Unless I've missed something, this is the common entry point to any file I/O write request, correct. Again, isn't the order backwards here? Other than this, does anyone know why weren't they committed? -- -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV | NEW! Corporate ISDN Prices dropped by up to 50%! Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost On Sat, Mar 14, 1998 at 08:58:25AM +0000, Terry Lambert wrote: > > I'm seeing an interesting problem here. > > No you're not. You are seeng a problem I already knew about, > and provided a patch for, but the patch was not checked in by > anyone (no, not the NFS locking patch; that should be checked > in, too, though). > > The problem is in the lease code, and has been exaggerated by > recent changes that cause the paging path to be invoked for > writes, as well. Leases are the same thing as opportunistic > locks, and... "Opportunity only locks once". 8-) 8-). > > > > Assuming two writers to an NFS file from the SAME machine. > > > > Writer #1 has opened the file O_RDWR and intends to do a set of random > > operations on it. > > > > Writer #2 has opened the file O_WRONLY|O_APPEND and intends only to add > > to the end of the file. > > > > Writer #1 gets an flock on the file, and does things to the file. He > > might even do an ftruncate at some point to roll back the contents. > > > > Writer #2, in the meantime, attempts to get an flock on the file and > > (correctly) blocks, waiting for Writer #1 to finish. > > > > Writer #1 gets done, and releases his lock. > > > > Writer #2 *SOMETIMES DOES NOT WRITE TO THE CORRECT (END OF FILE) PLACE*. > > > > This is difficult to reproduce, but it can be done. It appears that the > > O_APPEND isn't causing the implied lseek(....., 0, SEEK_END) to be done > > before each write in this situation. > > This behaviour is a result of a race condition. > > When you open a file, you are saying > > "Give me the index into my open file table of the struct > file * whose f_data points to the struct vnode for the > file I am opening, and whose f_offset is set to the end > of the file *at the time the write is issued*" > > Now what does O_APPEND mean? > > "When you can vn_open on the nameidata pointer, when you > call the VOP_OPEN, pass the FAPPEND bit in the fmode > argument, where all the FS's will (incorrectly) compare > it to IO_APPEND, which will work because FAPPEND is > defined as IO_APPEND in fcntl.h" > > When nfs_write() is called (in /sys/nfs/nfs_bio.c), if the ioflag > contains IO_APPEND, then a VOP_GETATTR is issued on the vp, and > the uio->uio_offset is set to the value of n_size, which came > from the cache. > > The cache value is set from va_size from a proxied VOP_GETATTR > to the remote system, but *ONLY* if the lease has expired on its > own, or has explicitly been expired. > > What you are seeing is the result of n_attrstamp not being updated > when it should be. > > In fact, you are seeing a race between the flock( fd, LOCK_UN), and > the atime/mtime updates as a result of the operations by write #1. > > > And if you trace down this race window, you will see that what is > happening is that your write call is not calling VOP_LEASE. > > You can get the patch for the files: > > /sys/kern/kern_ktrace.c > /sys/kern/link_aout.c > /src/sys/kern/tty_tty.c > /sys/kern/vfs_vnops.c > > From: > > http://www.freebsd.org/~terry/DIFF.LEASE > > Note: I am not terribly happy about this code. I believe VOP_LEASE > should be called with the vp locked, but I don't do this in two > vn_rdwr cases because of the indeterminate vp lock state. Of course, > the existing code blows this all to hell (SMP problems with NFS, > anyone?). > > The only way to really fix this reasonable would be to get rid of > vn_rdwr, and replace it with vn_read or vn_write, as appropriate > to the situation. Luckily this is really only a problem for the > ktrace and other unlikely code (it used to be a problem for the > image activator, too, until John fixed it after I complained about > it). > > Pretty clearly vn_rdwr was a hack to avoid locking and unlocking > the vp over every call (cough, cough, lazy thinking, cough, cough). > > > > I'm trying to nail this one down. The latest patches to the NFS code appear > > to make this happen more often than it did before, but it did occasionally > > happen even before the patches. > > Writes to mapped NFS pages need to trigger the lease revocation code > as well. This basically means that there needs to be a VOP_LEASE > call for each and every VOP_PUTPAGES in to NFS. > > This should probably go in the vnode pager code before VOP_PUTPAGES > is called. Before the recent changes, paging *to* an NFS FS was a > hopeless case. Now, it can cause problems (though probably not this > one; I don't believe a putpages can be used to extend a file). The > generic routine should be used in case we ever want to export an > opportunity locking mechanism with full coherency to user space > programs. Like SAMBA. But only if we want O_APPEND to work for > SMB and NFS clients accessing the same file (and what idiot would > want a consistently functional system? Where's the adventure? 8-)). > > > > This NEVER happens on locally mounted files, but occasionally happens > > on NFS mounted files. > > Yes, well, it's a real coding error in the VFS client code; what do > you expect? ;-). > > Like I said, though, doing the patches won't fix all the races with > putpages (consider a record being written, and making the decision of > whether or not to read before write, and a record in an adjacent page > was just touched), and it won't work well under SMP unless vn_rdwr > dies (but then the existing code can't work well under SMP, since most > of the other vn_rdwr references have the same bug). > > > Regards, > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message