From owner-freebsd-current@freebsd.org Wed Feb 14 02:11:37 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27150F17934 for ; Wed, 14 Feb 2018 02:11:37 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-pg0-x22b.google.com (mail-pg0-x22b.google.com [IPv6:2607:f8b0:400e:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9BEFB7E485; Wed, 14 Feb 2018 02:11:36 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: by mail-pg0-x22b.google.com with SMTP id j9so1359649pgv.3; Tue, 13 Feb 2018 18:11:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=l1mUVJTFWW8C98SekH3ffduQkzJmZzr55vNjsP2G02c=; b=YCrfYvAAT6dl3BA38xuULp1Qcz2ESllZa2VCVqwrDn/uAZDYmtOyPGPNwKawiA25kh yXhQXmCwfO3dhfM3M7QRH6e5TkFpu5PsOYNzIHtqv6k4tm2INRGU77DuaP43llQ0e1mR Cz9qP/rZQ+KPAy7cVUyYx1KiA2NesQ2hUAPgi9qnw3n+Vz0GVR+Hax3LvKyR5x0sfB8J If6MpIs1Qgh5CgifXA80hWEVtyZYRabG8mFUKheAZdXN54C0RmvVXzA7/HJJmIGOSSIk 9F+YwOKg52dMeMNrc+g04fIDxOZDzlcauuUT3ZRpOcDEJy0J21ER0BjbTsBqy5FC/RYE L4JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=l1mUVJTFWW8C98SekH3ffduQkzJmZzr55vNjsP2G02c=; b=B9YMmuHnw2HZEsfeCUY5NzqPBleD7O70o+e5Nicy0wQ5oC41IjUUdRZWs9OJxhp0jR 8yxlINPzkIoQ2uhW0mWFhz+nxIO0wXFCSTHBxaMQmlcecqQzSToEQ5wZofTw3nvAvwqD 7oSTyYawfakuyyRaw3D7chkUPzsye2ens+R5z8i5F5XPMfkH4o95gpTPlhuWoUk09CC5 /7c4LsqBy82Vjf8uErjYoCE8X2IpVLCLvEvJ8Fw+EqpR2wiKpsBFKcZ8HegH3wT84hdb AWCqBNL/cu7sGWADfwiWl6rqkL9zpHsxYkArgNueuBdzzvPtEc6SnEBYELHJ7CzvR1Cu Cgbw== X-Gm-Message-State: APf1xPAooW4N8Pi2p3ygAd7CjObvpogTEscFUj/sICh4oP1hhxpNyXtn GwObPY67d9GRKSSm6dQoaCbO38ccFFX/UBpHK/w= X-Google-Smtp-Source: AH8x225vaUPz5iRAyurXLBNJHil6NpcnIZ/TJ9B/WNWcNIxpcrkvjBgO9E807DG303UmBSrHjGbo9Bo6hY+WTDHCPCI= X-Received: by 10.98.51.70 with SMTP id z67mr3244138pfz.2.1518574295264; Tue, 13 Feb 2018 18:11:35 -0800 (PST) MIME-Version: 1.0 Sender: pkelsey@gmail.com Received: by 10.236.190.6 with HTTP; Tue, 13 Feb 2018 18:11:34 -0800 (PST) In-Reply-To: <1868530.6C5Wu4I1lN@ralph.baldwin.cx> References: <1e2f43fd-85da-6629-62d1-6e96790278e5@digiware.nl> <201802101846.w1AIkX4Y000167@hergotha.csail.mit.edu> <1868530.6C5Wu4I1lN@ralph.baldwin.cx> From: Patrick Kelsey Date: Tue, 13 Feb 2018 21:11:34 -0500 X-Google-Sender-Auth: 9BZJTmG6L5x1cfdOiXRG4gzS0SA Message-ID: Subject: Re: posix_fallocate on ZFS To: John Baldwin Cc: freebsd-current , Garrett Wollman , asomers@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Feb 2018 02:11:37 -0000 On Mon, Feb 12, 2018 at 12:04 PM, John Baldwin wrote: > On Saturday, February 10, 2018 01:46:33 PM Garrett Wollman wrote: > > In article > > , > > asomers@freebsd.org writes: > > > > >On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen > > >wrote: > > > > >> Is there any expectation that this is going to fixed in any near > future? > > > > >No. It's fundamentally impossible to support posix_fallocate on a COW > > >filesystem like ZFS. Ceph should be taught to ignore an EINVAL result, > > >since the system call is merely advisory. > > > > I don't think it's true that this is _fundamentally_ impossible. What > > the standard requires would in essence be a per-object refreservation. > > ZFS supports refreservation, obviously, but not on a per-object basis. > > Furthermore, there are mechanisms to preallocate blocks for things > > like dumps. So it *could* be done (as in, the concept is there), but > > it may not be practical. (And ultimately, there are ways in which the > > administrator might manage the system that would defeat the desired > > effect, but that's out of the standard's scope.) Given the semantic > > mismatch, though, I suspect it's unreasonable to expect anyone to > > prioritize implementation of such a feature. > > I don't think posix_fallocate() can be compatible with COW. Suppose you > do reserve a fixed set of blocks. That ensures the first write has a > place to write, but not if you overwrite one of those blocks. You'd have > to reserve another block to maintain the reservation each time you wrote > to a block, or you'd have to have a way to mark a file as not COW. The > first case isn't really any better than not using posix_fallocate() in the > first place as you are still requiring writes to allocate blocks, and the > second seems a bit fraught with peril as well if the application is > expecting the non-COW'd file to be in sync with other files in the system > since presumably non-COW'd files couldn't be snapshotted, etc. > > I think Garrett's assessment that it is not fundamentally impossible, but may not be felt to be worth implementing in any given file system for practical reasons, is correct. I say this having designed/implemented a COW file system that was driven by customer pressure to do things that at first pass one might declare represented an architectural contradiction, but upon further reflection were entirely possible to do given sufficient willingness to invest the effort and accept the accompanying trade-offs, additional knobs to turn, etc. In this case (posix_fallocate() + COW + snapshots), it could be implemented with a per-object allocator that normally keeps at least one extra block beyond the reservation requirement on hand, plus a snapshot operation that in order to succeed has to be able to provision the local allocators of all fallocated objects with enough additional blocks to maintain the no-fail write guarantee post-snapshot. -Patrick