From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 20 17:12:23 2005 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DC5E616A4CE; Wed, 20 Apr 2005 17:12:22 +0000 (GMT) Received: from mailhost.stack.nl (vaak.stack.nl [131.155.140.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id BCB3D43D1D; Wed, 20 Apr 2005 17:12:21 +0000 (GMT) (envelope-from jilles@stack.nl) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mailhost.stack.nl (Postfix) with ESMTP id B84A61F1CD; Wed, 20 Apr 2005 19:12:20 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id AA0B71CEAA; Wed, 20 Apr 2005 19:12:20 +0200 (CEST) Date: Wed, 20 Apr 2005 19:12:20 +0200 From: Jilles Tjoelker To: Brian Fundakowski Feldman Message-ID: <20050420171220.GB93623@stack.nl> References: <20050419160258.GA12287@stack.nl> <20050419160900.GB12287@stack.nl> <20050419161616.GF1157@green.homeunix.org> <20050419204723.GG1157@green.homeunix.org> <20050420140409.GA77731@stack.nl> <20050420142448.GH1157@green.homeunix.org> <20050420143842.GB77731@stack.nl> <20050420152038.GI1157@green.homeunix.org> <20050420153528.GC77731@stack.nl> <20050420155233.GJ1157@green.homeunix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050420155233.GJ1157@green.homeunix.org> X-Operating-System: FreeBSD 5.3-RELEASE-p9 i386 User-Agent: Mutt/1.5.6i cc: Marc Olzheim cc: freebsd-hackers@freebsd.org cc: freebsd-current@freebsd.org Subject: Re: NFS client/buffer cache deadlock X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Apr 2005 17:12:23 -0000 On Wed, Apr 20, 2005 at 11:52:33AM -0400, Brian Fundakowski Feldman wrote: > On Wed, Apr 20, 2005 at 05:35:28PM +0200, Marc Olzheim wrote: > > On Wed, Apr 20, 2005 at 11:20:38AM -0400, Brian Fundakowski Feldman wrote: > > > > Btw.: I'm not sure write(),writev() and pwrite() are allowed to do short > > > > writes on regular files... ? > > > Our manpage is incorrect; POSIX states that they are (see earlier > > > e-mail). There really is no alternative -- we simply can't build > > > an NFS transaction larger than our buffer cache can accomodate. > > > Note that short wries won't happen for normal buffer sizes, only > > > excessively large ones. I really don't believe that writev() is meant > > > to be used so that you can write gigantic data structures in a single > > > transaction... It is ok to return partial success if the first chunk of a large write succeeded and a later chunk failed persistently, but not if it cannot be performed as a single NFS transaction. > > Ah, I was reading the SUSv2 page: > > http://www.opengroup.org/onlinepubs/009695399/functions/write.html > > instead of the POSIX version. > > But in neither of those I can extrude the fact that it can return > > with result < nbyte, without it being a permanent condition. > > What phrase makes you conclude that it can ? > This specific issue is not clear-cut; the best thing to do lies somewhere > within the range of these scenarios: > "If a write() requests that more bytes be written than there is room > for (for example, [XSI] [Option Start] the process' file size limit > or [Option End] the physical end of a medium), only as many bytes as > there is room for shall be written. For example, suppose there is > space for 20 bytes more in a file before reaching a limit. A write of > 512 bytes will return 20. The next write of a non-zero number of bytes > would give a failure return (except as noted below)." This only applies to permanent conditions. > "When attempting to write to a file descriptor (other than a pipe or > FIFO) that supports non-blocking writes and cannot accept the data > immediately: > * If the O_NONBLOCK flag is clear, write() shall block the calling > thread until the data can be accepted. > * If the O_NONBLOCK flag is set, write() shall not block the > thread. If some data can be written without blocking the thread, > write() shall write what it can and return the number of bytes > written. Otherwise, it shall return -1 and set errno to [EAGAIN]." I think regular files do not support non-blocking writes, even if they are on NFS; in any case, O_NONBLOCK is disabled by default. > "[ENOBUFS] Insufficient resources were available in the system to > perform the operation." > I think the first is more useful behavior than the last. Supporting it > should be exactly the same as supporting what happens if the actual > filesystem fills up. In this case, the filesystem is being requested to > write more "than there is room for." The filesystem filling up is a totally different case as attempting the rest of the write is futile in that case. In a lot of code, a short write() is treated as a (fairly) persistent error. -- Jilles Tjoelker