Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Mar 1998 11:37:43 -0600
From:      Karl Denninger  <karl@mcs.net>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: Odd problem we're seeing here
Message-ID:  <19980314113743.62444@mcs.net>
In-Reply-To: <199803140858.BAA17978@usr08.primenet.com>; from Terry Lambert on Sat, Mar 14, 1998 at 08:58:25AM %2B0000
References:  <19980312223102.50360@mcs.net> <199803140858.BAA17978@usr08.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hmmm... I've looked at these patches, and have a couple of questions...

You note that the vnode "should" be locked when the LEASE call is made - yet
in kern_ktrace.c, you call the LEASE request *before* the lock is asserted.
Is this correct, and why?  Wouldn't you want to lock the object first to
prevent setting up another potential race condition?  Also, ktrace.c
isn't a "standard" execution path, is it?

link_aout.c applies to loading executables, and there it appears that I
can't do any harm (or help) to the current situation by applying this.

In the tty_tty.c case, you're also calling the lease before the lock.
Unless I've missed something, this is the common entry point to any file
I/O write request, correct.  Again, isn't the order backwards here?

Other than this, does anyone know why weren't they committed?

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly / All Lines K56Flex/DOV
			     | NEW! Corporate ISDN Prices dropped by up to 50%!
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost


On Sat, Mar 14, 1998 at 08:58:25AM +0000, Terry Lambert wrote:
> > I'm seeing an interesting problem here.
> 
> No you're not.  You are seeng a problem I already knew about,
> and provided a patch for, but the patch was not checked in by
> anyone (no, not the NFS locking patch; that should be checked
> in, too, though).
> 
> The problem is in the lease code, and has been exaggerated by
> recent changes that cause the paging path to be invoked for
> writes, as well.  Leases are the same thing as opportunistic
> locks, and...  "Opportunity only locks once".  8-) 8-).
> 
> 
> > Assuming two writers to an NFS file from the SAME machine.
> > 
> > Writer #1 has opened the file O_RDWR and intends to do a set of random
> > operations on it.
> > 
> > Writer #2 has opened the file O_WRONLY|O_APPEND and intends only to add
> > to the end of the file.
> > 
> > Writer #1 gets an flock on the file, and does things to the file.  He
> > might even do an ftruncate at some point to roll back the contents.
> > 
> > Writer #2, in the meantime, attempts to get an flock on the file and 
> > (correctly) blocks, waiting for Writer #1 to finish.
> > 
> > Writer #1 gets done, and releases his lock.
> > 
> > Writer #2 *SOMETIMES DOES NOT WRITE TO THE CORRECT (END OF FILE) PLACE*.
> > 
> > This is difficult to reproduce, but it can be done.  It appears that the
> > O_APPEND isn't causing the implied lseek(....., 0, SEEK_END) to be done
> > before each write in this situation.
> 
> This behaviour is a result of a race condition.
> 
> When you open a file, you are saying
> 
> 	"Give me the index into my open file table of the struct
> 	 file * whose f_data points to the struct vnode for the
> 	 file I am opening, and whose f_offset is set to the end
> 	 of the file *at the time the write is issued*"
> 
> Now what does O_APPEND mean?
> 
> 	"When you can vn_open on the nameidata pointer, when you
> 	 call the VOP_OPEN, pass the FAPPEND bit in the fmode
> 	 argument, where all the FS's will (incorrectly) compare
> 	 it to IO_APPEND, which will work because FAPPEND is
> 	 defined as IO_APPEND in fcntl.h"
> 
> When nfs_write() is called (in /sys/nfs/nfs_bio.c), if the ioflag
> contains IO_APPEND, then a VOP_GETATTR is issued on the vp, and
> the uio->uio_offset is set to the value of n_size, which came
> from the cache.
> 
> The cache value is set from va_size from a proxied VOP_GETATTR
> to the remote system, but *ONLY* if the lease has expired on its
> own, or has explicitly been expired.
> 
> What you are seeing is the result of n_attrstamp not being updated
> when it should be.
> 
> In fact, you are seeing a race between the flock( fd, LOCK_UN), and
> the atime/mtime updates as a result of the operations by write #1.
> 
> 
> And if you trace down this race window, you will see that what is
> happening is that your write call is not calling VOP_LEASE.
> 
> You can get the patch for the files:
> 
> 	/sys/kern/kern_ktrace.c
> 	/sys/kern/link_aout.c
> 	/src/sys/kern/tty_tty.c
> 	/sys/kern/vfs_vnops.c
> 
> From:
> 
> 	http://www.freebsd.org/~terry/DIFF.LEASE
> 
> Note: I am not terribly happy about this code.  I believe VOP_LEASE
> should be called with the vp locked, but I don't do this in two
> vn_rdwr cases because of the indeterminate vp lock state.  Of course,
> the existing code blows this all to hell (SMP problems with NFS,
> anyone?).
> 
> The only way to really fix this reasonable would be to get rid of
> vn_rdwr, and replace it with vn_read or vn_write, as appropriate
> to the situation.  Luckily this is really only a problem for the
> ktrace and other unlikely code (it used to be a problem for the
> image activator, too, until John fixed it after I complained about
> it).
> 
> Pretty clearly vn_rdwr was a hack to avoid locking and unlocking
> the vp over every call (cough, cough, lazy thinking, cough, cough).
> 
> 
> > I'm trying to nail this one down.  The latest patches to the NFS code appear 
> > to make this happen more often than it did before, but it did occasionally
> > happen even before the patches.
> 
> Writes to mapped NFS pages need to trigger the lease revocation code
> as well.  This basically means that there needs to be a VOP_LEASE
> call for each and every VOP_PUTPAGES in to NFS.
> 
> This should probably go in the vnode pager code before VOP_PUTPAGES
> is called.  Before the recent changes, paging *to* an NFS FS was a
> hopeless case.  Now, it can cause problems (though probably not this
> one; I don't believe a putpages can be used to extend a file).  The
> generic routine should be used in case we ever want to export an
> opportunity locking mechanism with full coherency to user space
> programs.  Like SAMBA.  But only if we want O_APPEND to work for
> SMB and NFS clients accessing the same file (and what idiot would
> want a consistently functional system?  Where's the adventure?  8-)).
> 
> 
> > This NEVER happens on locally mounted files, but occasionally happens
> > on NFS mounted files.
> 
> Yes, well, it's a real coding error in the VFS client code; what do
> you expect?  ;-).
> 
> Like I said, though, doing the patches won't fix all the races with
> putpages (consider a record being written, and making the decision of
> whether or not to read before write, and a record in an adjacent page
> was just touched), and it won't work well under SMP unless vn_rdwr
> dies (but then the existing code can't work well under SMP, since most
> of the other vn_rdwr references have the same bug).
> 
> 
> 					Regards,
> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980314113743.62444>