Date: Tue, 26 Apr 2005 12:25:49 -0400 From: Brian Fundakowski Feldman <green@freebsd.org> To: Marc Olzheim <marcolz@stack.nl> Cc: freebsd-standards@freebsd.org Subject: Re: NFS client/buffer cache deadlock Message-ID: <20050426162549.GD5789@green.homeunix.org> In-Reply-To: <20050426160609.GA68511@stack.nl> References: <20050419204723.GG1157@green.homeunix.org> <20050420140409.GA77731@stack.nl> <20050420142448.GH1157@green.homeunix.org> <20050420143842.GB77731@stack.nl> <16998.36437.809896.936800@khavrinen.csail.mit.edu> <20050420173859.GA99695@stack.nl> <20050426140701.GB5789@green.homeunix.org> <20050426151751.GB68038@stack.nl> <20050426155043.GC5789@green.homeunix.org> <20050426160609.GA68511@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 26, 2005 at 06:06:09PM +0200, Marc Olzheim wrote: > On Tue, Apr 26, 2005 at 11:50:43AM -0400, Brian Fundakowski Feldman wrote: > > > I'm okay with the fact that simultaneous huge writes to the same file > > > over NFS could lead to corruption and that the exact outcome is > > > undefined. > > > > > > This is exactly how it was in FreeBSD 4.x and that's perfectly workable. > > > > > > But that's just my way of looking at it and certainly not ideal. :-/ > > > > I don't know what you mean. The exact same bug should exists in 4.x, > > and should cause a system deadlock in exactly the same scenario. > > I'm not sure you understand the "scenario". All I do is create a new > file and writev 600 * 1MB to it. This creates a VFS hangup on FreeBSD > 5.x after writing an amount of 2-100 MB (depending on how much memory is > in the system), while 4.x just does what it is told and doesn't hangup. > > I do not have any synchronisation problems. > > See kern/79208 Then it sounds like for whatever reason FreeBSD 4.x isn't negotiating NFSv3 properly and should be fixed. This is fundamentally a deadlock situation. The write is a transaction and requires any part of the write request may be retransmitted. This can only be accomplished by retaining the entire write contents for the duration of the operation. You can assure that this happens in only two ways: 1. Make a complete copy of the data. This is what currently occurs: it gets stuffed into the buffer cache as the write happens. 2. Keep the data around synchronously -- by virtue of the write system call being used synchronously, the thread's VM context is around, and duplication need not occur. I'm trying to fix all situations, not just yours. I think I've changed my mind about short writes being acceptable simply because short writes will cause detectable corruption, but once detected, you have no knowledge of the exact location of the corruption. So it's really only a choice between forcing synchronous operation implicitly, or explicitly by returning an error that says no data at all was written. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050426162549.GD5789>