From owner-freebsd-arch@FreeBSD.ORG Sun Jan 29 11:32:00 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF8FA106566B; Sun, 29 Jan 2012 11:31:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 5F30F8FC18; Sun, 29 Jan 2012 11:31:58 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0TBVpHs003567 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 29 Jan 2012 22:31:55 +1100 Date: Sun, 29 Jan 2012 22:31:51 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jilles Tjoelker In-Reply-To: <20120128233712.GA92694@stack.nl> Message-ID: <20120129215808.V844@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> <20120128233712.GA92694@stack.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: flo@FreeBSD.org, Giovanni Trematerra , Attilio Rao , Konstantin Belousov , freebsd-arch@FreeBSD.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jan 2012 11:32:00 -0000 On Sun, 29 Jan 2012, Jilles Tjoelker wrote: > On Tue, Jan 17, 2012 at 12:43:19PM +0100, Giovanni Trematerra wrote: >> Did you agree that this patch >> http://www.trematerra.net/patches/pipefifo_merge2.4.diff Sorry I haven;t got around to reviewing this. >> doesn't introduce any further regressions while it fixes > >> - style bugs you pointed out. >> - pipe_stat now use underlying filesystem information for pipes. The last still can't work. It is necessary to use the underlying file system for all read and write metadata operations. It is too hard to emulate what the underlying file system would do and write to it when the pipe state is released. >> - comment about pipeinfo was intended for. >> - race into pipe_poll (look at fifo_iseof line). > > I tested this version of the patch and found that it breaks opening a > fifo with O_TRUNC: it fails with [EINVAL]. This appears to be > pipe_truncate()'s doing. Previously, truncate requests went to the > vnode. This probably affects truncate() and ftruncate() too. Most file systems accept bogus truncation requests and do nothing, but pipe_truncate() always returns EINVAL. POSIX requires that O_TRUNC have no effect for fifos and terminals, and of course to work if the file is regular. Its effect for other file types is implementation-defined. The FreeBSD implementation doesn't actually define (document) the behaviour anywhere AFAIK. Its documentation in open(2) is just broken, since says that O_TRUNC causes truncation to size 0 without mentioning any restrictions on the file type. POSIX requires ftruncate() to fail if the fd refers to a directory, and of course to work if the fd refers to a regular file, and to have certain behaviour if the fd refers to a shared memory object. Its effect for other file types is unspecifed. ftruncate is normally implemented in the file system's VOP_SETATTR() (fo_truncate for vnodes is vn_truncate() which calls VOP_SETATTR()). If you fix the attributes, then O_TRUNC will be fixed automatically. devfs implements fo_truncate bogusly, using a lot of code to do almost nothing. It starts with a perfectly suitable fileop for this, namely devfs_ops_f = devfs_truncate_f(). This just calls vop.fo_truncate = vn_truncate(), which executes a lot of code that reduces to calling VOP_SETATTR() = devfs_vnodeops.vop_setattr = devfs_settatr(). devfs_setattr() then silenly ignores the request to set the size. But this is wrong for directories. However, an error is returned for directories because vn_truncate() doesn't quite reduce to VOP_SETATTR(). It also checks for directories and returns EISDIR for them. Perhaps always going through the vn layer is a good idea after all. ffs has the following complications in ufs_setattr() for truncation: - truncation works on symlinks too. This seems to be undocumented. - but a read-only mount gives error EROFS - a snapshot for the inode gives EPERM. This seems to be undocumented. - immutability or append-only for the inode gives EPERM. BTW, most or all setattr() vops have a broken test for va_bytes == VNOVAL. They break va_bytes in the test by casting it to int. Thus 2**32-1 values for it with the low 32 bits set are misclassified as VNOVAL (1 of such 2**32 values is correctly classified). But this has no effect, since va_bytes is always VNOVAL. I think most or all the other unsettable attributes that are laboriously checked for by setattr vnops are also impossible. It would take a kernel bug for a generic unsettable attributes to be requested to be set. A really large bug, like requesting to change the file flags instead of the file mode. Checking for the generic unsettable attributes just obscures the checks for unsupported attributes that some file systems should be doing. Bruce