From owner-freebsd-current@freebsd.org Mon Feb 12 18:47:21 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07656F114BE for ; Mon, 12 Feb 2018 18:47:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 988D684F62; Mon, 12 Feb 2018 18:47:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (ralph.baldwin.cx [66.234.199.215]) by mail.baldwin.cx (Postfix) with ESMTPSA id A0E0710A8C2; Mon, 12 Feb 2018 13:47:19 -0500 (EST) From: John Baldwin To: freebsd-current@freebsd.org Cc: Garrett Wollman , asomers@freebsd.org Subject: Re: posix_fallocate on ZFS Date: Mon, 12 Feb 2018 09:04:57 -0800 Message-ID: <1868530.6C5Wu4I1lN@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.1-STABLE; KDE/4.14.30; amd64; ; ) In-Reply-To: <201802101846.w1AIkX4Y000167@hergotha.csail.mit.edu> References: <1e2f43fd-85da-6629-62d1-6e96790278e5@digiware.nl> <201802101846.w1AIkX4Y000167@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Mon, 12 Feb 2018 13:47:19 -0500 (EST) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Feb 2018 18:47:21 -0000 On Saturday, February 10, 2018 01:46:33 PM Garrett Wollman wrote: > In article > , > asomers@freebsd.org writes: > > >On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen > >wrote: > > >> Is there any expectation that this is going to fixed in any near future? > > >No. It's fundamentally impossible to support posix_fallocate on a COW > >filesystem like ZFS. Ceph should be taught to ignore an EINVAL result, > >since the system call is merely advisory. > > I don't think it's true that this is _fundamentally_ impossible. What > the standard requires would in essence be a per-object refreservation. > ZFS supports refreservation, obviously, but not on a per-object basis. > Furthermore, there are mechanisms to preallocate blocks for things > like dumps. So it *could* be done (as in, the concept is there), but > it may not be practical. (And ultimately, there are ways in which the > administrator might manage the system that would defeat the desired > effect, but that's out of the standard's scope.) Given the semantic > mismatch, though, I suspect it's unreasonable to expect anyone to > prioritize implementation of such a feature. I don't think posix_fallocate() can be compatible with COW. Suppose you do reserve a fixed set of blocks. That ensures the first write has a place to write, but not if you overwrite one of those blocks. You'd have to reserve another block to maintain the reservation each time you wrote to a block, or you'd have to have a way to mark a file as not COW. The first case isn't really any better than not using posix_fallocate() in the first place as you are still requiring writes to allocate blocks, and the second seems a bit fraught with peril as well if the application is expecting the non-COW'd file to be in sync with other files in the system since presumably non-COW'd files couldn't be snapshotted, etc. -- John Baldwin