From owner-freebsd-hackers  Mon Jul 27 07:58:39 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id HAA29891
          for freebsd-hackers-outgoing; Mon, 27 Jul 1998 07:58:39 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from Kitten.mcs.com (Kitten.mcs.com [192.160.127.90])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA29876
          for <hackers@FreeBSD.ORG>; Mon, 27 Jul 1998 07:58:36 -0700 (PDT)
          (envelope-from karl@Mars.mcs.net)
Received: from Mars.mcs.net (karl@Mars.mcs.net [192.160.127.85]) by Kitten.mcs.com (8.8.7/8.8.2) with ESMTP id JAA00835; Mon, 27 Jul 1998 09:58:07 -0500 (CDT)
Received: (from karl@localhost) by Mars.mcs.net (8.8.7/8.8.2) id JAA24425; Mon, 27 Jul 1998 09:58:07 -0500 (CDT)
Message-ID: <19980727095806.05337@mcs.net>
Date: Mon, 27 Jul 1998 09:58:06 -0500
From: Karl Denninger  <karl@mcs.net>
To: Mike Smith <mike@smith.net.au>
Cc: Terry Lambert <tlambert@primenet.com>, hackers@FreeBSD.ORG
Subject: Re: Odd problem we're seeing here
References: <199803140858.BAA17978@usr08.primenet.com> <199807270539.WAA14548@antipodes.cdrom.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.84
In-Reply-To: <199807270539.WAA14548@antipodes.cdrom.com>; from Mike Smith on Sun, Jul 26, 1998 at 10:39:05PM -0700
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, Jul 26, 1998 at 10:39:05PM -0700, Mike Smith wrote:
> > > I'm seeing an interesting problem here.
> > 
> > No you're not.  You are seeng a problem I already knew about,
> > and provided a patch for, but the patch was not checked in by
> > anyone (no, not the NFS locking patch; that should be checked
> > in, too, though).
> 
> Was this issue ever resolved to your satisfaction?

Not really.  I turned off NVSV3 on the affected machines; that "fixed" it,
but its a hack, not a real fix.

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly / All Lines K56Flex/DOV
			     | NEW! Corporate ISDN Prices dropped by up to 50%!
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

> > The problem is in the lease code, and has been exaggerated by
> > recent changes that cause the paging path to be invoked for
> > writes, as well.  Leases are the same thing as opportunistic
> > locks, and...  "Opportunity only locks once".  8-) 8-).
> > 
> > 
> > > Assuming two writers to an NFS file from the SAME machine.
> > > 
> > > Writer #1 has opened the file O_RDWR and intends to do a set of random
> > > operations on it.
> > > 
> > > Writer #2 has opened the file O_WRONLY|O_APPEND and intends only to add
> > > to the end of the file.
> > > 
> > > Writer #1 gets an flock on the file, and does things to the file.  He
> > > might even do an ftruncate at some point to roll back the contents.
> > > 
> > > Writer #2, in the meantime, attempts to get an flock on the file and 
> > > (correctly) blocks, waiting for Writer #1 to finish.
> > > 
> > > Writer #1 gets done, and releases his lock.
> > > 
> > > Writer #2 *SOMETIMES DOES NOT WRITE TO THE CORRECT (END OF FILE) PLACE*.
> > > 
> > > This is difficult to reproduce, but it can be done.  It appears that the
> > > O_APPEND isn't causing the implied lseek(....., 0, SEEK_END) to be done
> > > before each write in this situation.
> > 
> > This behaviour is a result of a race condition.
> > 
> > When you open a file, you are saying
> > 
> > 	"Give me the index into my open file table of the struct
> > 	 file * whose f_data points to the struct vnode for the
> > 	 file I am opening, and whose f_offset is set to the end
> > 	 of the file *at the time the write is issued*"
> > 
> > Now what does O_APPEND mean?
> > 
> > 	"When you can vn_open on the nameidata pointer, when you
> > 	 call the VOP_OPEN, pass the FAPPEND bit in the fmode
> > 	 argument, where all the FS's will (incorrectly) compare
> > 	 it to IO_APPEND, which will work because FAPPEND is
> > 	 defined as IO_APPEND in fcntl.h"
> > 
> > When nfs_write() is called (in /sys/nfs/nfs_bio.c), if the ioflag
> > contains IO_APPEND, then a VOP_GETATTR is issued on the vp, and
> > the uio->uio_offset is set to the value of n_size, which came
> > from the cache.
> > 
> > The cache value is set from va_size from a proxied VOP_GETATTR
> > to the remote system, but *ONLY* if the lease has expired on its
> > own, or has explicitly been expired.
> > 
> > What you are seeing is the result of n_attrstamp not being updated
> > when it should be.
> > 
> > In fact, you are seeing a race between the flock( fd, LOCK_UN), and
> > the atime/mtime updates as a result of the operations by write #1.
> > 
> > 
> > And if you trace down this race window, you will see that what is
> > happening is that your write call is not calling VOP_LEASE.
> > 
> > You can get the patch for the files:
> > 
> > 	/sys/kern/kern_ktrace.c
> > 	/sys/kern/link_aout.c
> > 	/src/sys/kern/tty_tty.c
> > 	/sys/kern/vfs_vnops.c
> > 
> > From:
> > 
> > 	http://www.freebsd.org/~terry/DIFF.LEASE
> > 
> > Note: I am not terribly happy about this code.  I believe VOP_LEASE
> > should be called with the vp locked, but I don't do this in two
> > vn_rdwr cases because of the indeterminate vp lock state.  Of course,
> > the existing code blows this all to hell (SMP problems with NFS,
> > anyone?).
> > 
> > The only way to really fix this reasonable would be to get rid of
> > vn_rdwr, and replace it with vn_read or vn_write, as appropriate
> > to the situation.  Luckily this is really only a problem for the
> > ktrace and other unlikely code (it used to be a problem for the
> > image activator, too, until John fixed it after I complained about
> > it).
> > 
> > Pretty clearly vn_rdwr was a hack to avoid locking and unlocking
> > the vp over every call (cough, cough, lazy thinking, cough, cough).
> > 
> > 
> > > I'm trying to nail this one down.  The latest patches to the NFS code appear 
> > > to make this happen more often than it did before, but it did occasionally
> > > happen even before the patches.
> > 
> > Writes to mapped NFS pages need to trigger the lease revocation code
> > as well.  This basically means that there needs to be a VOP_LEASE
> > call for each and every VOP_PUTPAGES in to NFS.
> > 
> > This should probably go in the vnode pager code before VOP_PUTPAGES
> > is called.  Before the recent changes, paging *to* an NFS FS was a
> > hopeless case.  Now, it can cause problems (though probably not this
> > one; I don't believe a putpages can be used to extend a file).  The
> > generic routine should be used in case we ever want to export an
> > opportunity locking mechanism with full coherency to user space
> > programs.  Like SAMBA.  But only if we want O_APPEND to work for
> > SMB and NFS clients accessing the same file (and what idiot would
> > want a consistently functional system?  Where's the adventure?  8-)).
> > 
> > 
> > > This NEVER happens on locally mounted files, but occasionally happens
> > > on NFS mounted files.
> > 
> > Yes, well, it's a real coding error in the VFS client code; what do
> > you expect?  ;-).
> > 
> > Like I said, though, doing the patches won't fix all the races with
> > putpages (consider a record being written, and making the decision of
> > whether or not to read before write, and a record in an adjacent page
> > was just touched), and it won't work well under SMP unless vn_rdwr
> > dies (but then the existing code can't work well under SMP, since most
> > of the other vn_rdwr references have the same bug).
> > 
> > 
> > 					Regards,
> > 					Terry Lambert
> > 					terry@lambert.org
> > ---
> > Any opinions in this posting are my own and not those of my present
> > or previous employers.
> > 
> > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > with "unsubscribe freebsd-hackers" in the body of the message
> > 
> 
> -- 
> \\  Sometimes you're ahead,       \\  Mike Smith
> \\  sometimes you're behind.      \\  mike@smith.net.au
> \\  The race is long, and in the  \\  msmith@freebsd.org
> \\  end it's only with yourself.  \\  msmith@cdrom.com
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message