From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 20 17:30:33 2005 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id CCC3316A4CE; Wed, 20 Apr 2005 17:30:32 +0000 (GMT) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.13.3/8.13.1) with ESMTP id j3KHSew1073163; Wed, 20 Apr 2005 13:28:40 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.13.3/8.13.1/Submit) id j3KHSei8073162; Wed, 20 Apr 2005 13:28:40 -0400 (EDT) (envelope-from green) Date: Wed, 20 Apr 2005 13:28:39 -0400 From: Brian Fundakowski Feldman To: Jilles Tjoelker Message-ID: <20050420172839.GK1157@green.homeunix.org> References: <20050419160900.GB12287@stack.nl> <20050419161616.GF1157@green.homeunix.org> <20050419204723.GG1157@green.homeunix.org> <20050420140409.GA77731@stack.nl> <20050420142448.GH1157@green.homeunix.org> <20050420143842.GB77731@stack.nl> <20050420152038.GI1157@green.homeunix.org> <20050420153528.GC77731@stack.nl> <20050420155233.GJ1157@green.homeunix.org> <20050420171220.GB93623@stack.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050420171220.GB93623@stack.nl> User-Agent: Mutt/1.5.6i cc: Marc Olzheim cc: freebsd-hackers@freebsd.org cc: freebsd-current@freebsd.org Subject: Re: NFS client/buffer cache deadlock X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Apr 2005 17:30:33 -0000 On Wed, Apr 20, 2005 at 07:12:20PM +0200, Jilles Tjoelker wrote: > On Wed, Apr 20, 2005 at 11:52:33AM -0400, Brian Fundakowski Feldman wrote: > > On Wed, Apr 20, 2005 at 05:35:28PM +0200, Marc Olzheim wrote: > > > On Wed, Apr 20, 2005 at 11:20:38AM -0400, Brian Fundakowski Feldman wrote: > > > > > Btw.: I'm not sure write(),writev() and pwrite() are allowed to do short > > > > > writes on regular files... ? > > > > > Our manpage is incorrect; POSIX states that they are (see earlier > > > > e-mail). There really is no alternative -- we simply can't build > > > > an NFS transaction larger than our buffer cache can accomodate. > > > > Note that short wries won't happen for normal buffer sizes, only > > > > excessively large ones. I really don't believe that writev() is meant > > > > to be used so that you can write gigantic data structures in a single > > > > transaction... > > It is ok to return partial success if the first chunk of a large write > succeeded and a later chunk failed persistently, but not if it cannot be > performed as a single NFS transaction. What is your rationale for this? > > > Ah, I was reading the SUSv2 page: > > > > http://www.opengroup.org/onlinepubs/009695399/functions/write.html > > > > instead of the POSIX version. > > > > But in neither of those I can extrude the fact that it can return > > > with result < nbyte, without it being a permanent condition. > > > What phrase makes you conclude that it can ? > > > This specific issue is not clear-cut; the best thing to do lies somewhere > > within the range of these scenarios: > > > "If a write() requests that more bytes be written than there is room > > for (for example, [XSI] [Option Start] the process' file size limit > > or [Option End] the physical end of a medium), only as many bytes as > > there is room for shall be written. For example, suppose there is > > space for 20 bytes more in a file before reaching a limit. A write of > > 512 bytes will return 20. The next write of a non-zero number of bytes > > would give a failure return (except as noted below)." > > This only applies to permanent conditions. > > > "When attempting to write to a file descriptor (other than a pipe or > > FIFO) that supports non-blocking writes and cannot accept the data > > immediately: > > > * If the O_NONBLOCK flag is clear, write() shall block the calling > > thread until the data can be accepted. > > > * If the O_NONBLOCK flag is set, write() shall not block the > > thread. If some data can be written without blocking the thread, > > write() shall write what it can and return the number of bytes > > written. Otherwise, it shall return -1 and set errno to [EAGAIN]." > > I think regular files do not support non-blocking writes, even if they > are on NFS; in any case, O_NONBLOCK is disabled by default. POSIX does not specify O_NONBLOCK semantics for regular files. This means we can do whatever is most useful. > > "[ENOBUFS] Insufficient resources were available in the system to > > perform the operation." > > > I think the first is more useful behavior than the last. Supporting it > > should be exactly the same as supporting what happens if the actual > > filesystem fills up. In this case, the filesystem is being requested to > > write more "than there is room for." > > The filesystem filling up is a totally different case as attempting the > rest of the write is futile in that case. No, it isn't. The filesystem may be not-full again soon, possibly even what the program might consider "immediately". > In a lot of code, a short write() is treated as a (fairly) persistent > error. I mentioned this several e-mails ago. Plenty of software is also not going to understand ENOBUFS. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\