From owner-freebsd-hackers@freebsd.org Tue Dec 8 15:44:57 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B35E39D46BF for ; Tue, 8 Dec 2015 15:44:57 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qg0-x22d.google.com (mail-qg0-x22d.google.com [IPv6:2607:f8b0:400d:c04::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 632291650 for ; Tue, 8 Dec 2015 15:44:57 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qgcc31 with SMTP id c31so21528870qgc.3 for ; Tue, 08 Dec 2015 07:44:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=K3qiGpz0hWsri2eZI8zfD2m/S7xCNRHLR93sVRZqM9E=; b=OVvSyyW3EXNhP5iM1Y3rLUDg5QPQYGaUHeR2OUwBumpzwHEdOu5b5ye1EAzwim+KAO 4cUfc6RzHSAVHJFiLiQJFK2P20qelb8tzS7IjIQm7A36GIa8l1aB9QprR7kA/xzqNP+W QFJZufdJsLx49PSx3Ox42TaqJC4gKU3N6J45mrhlIZVl5Q1qBOpwMEuu1nIFXvXQ8ugV tiUYq3IqNExaLjX04Xk5IVkXjpGb02HSfIoTgZgN6RgGFjbNLFYrNGPyC13LwEnNMBQZ PVt4AlMtzVmt/qnXqz0OL3gTO1vHJIWFfdKs7Pf1gYJFzH25p1pojo00RIyFAnbX+H32 7Xuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=K3qiGpz0hWsri2eZI8zfD2m/S7xCNRHLR93sVRZqM9E=; b=JqL5NRpQk5YLhpF2uaQjyCmdnbZhzmjgmefAwE0STnXRe+4ji40qh6ER0wk8KlXmhs nngxQXgdxnd4stPYLZBeKha7QdkRMlRonDAKL4DQlhl0MTx4mvUrQypyRIG4ho9WoIuR O8zU4LRlI2KJmmIUL84Ad5OqZll2xcS6QVgx+ZlmkBJbp3SFuXBbCPRyDIFoAa16Nsuu 280K6p/ZgrJ5+qbjAZ0jadbfkEHay6UZbH4sFxSGQSjLksKXgKBGwriH1KQqSzy0FgO9 XXGhzbBA3n+mp2CfF+jOpa2n1GlbZ6OZLR03Ml1GQ3j+WeU/utXTrTWyDedVBH/yIPvd Hgjg== X-Gm-Message-State: ALoCoQkKmWti3+4w4xseKK/Pazh1ucYtzza7dxF4IepA3f+cejNuTi9CM5LdnUNZiAcJhFY1T0clDk34gpV4xLggh321dWxM0w== MIME-Version: 1.0 X-Received: by 10.140.99.86 with SMTP id p80mr210890qge.97.1449589496490; Tue, 08 Dec 2015 07:44:56 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Tue, 8 Dec 2015 07:44:56 -0800 (PST) X-Originating-IP: [2601:280:4900:3700:4d3f:8eba:ea86:7700] In-Reply-To: References: <201512052002.tB5K2ZEA026540@chez.mckusick.com> <86poyhqsdh.fsf@desk.des.no> Date: Tue, 8 Dec 2015 08:44:56 -0700 X-Google-Sender-Auth: VfmcLj4l2IODL3hjz4xPQqGLomQ Message-ID: Subject: Fwd: DELETE support in the VOP_STRATEGY(9)? From: Warner Losh To: "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2015 15:44:57 -0000 [ forgot to cc hackers ] ---------- Forwarded message ---------- From: Warner Losh Date: Tue, Dec 8, 2015 at 8:22 AM Subject: Re: DELETE support in the VOP_STRATEGY(9)? To: Maxim Sobolev Cc: Dag-Erling Sm=C3=B8rgrav , Kirk McKusick , Pawel Jakub Dawidek On Tue, Dec 8, 2015 at 1:53 AM, Maxim Sobolev wrote: > 1. There was a complain from list admin about this being off-topic. I > think I've collected enough input from the interested parties so that I c= an > work on my patch before I feel like posting it publicly on some more > appropriate tech list for a wider review. > > 2. I did not really ignore it, it just that I did not have much to reply > at that point. But after being able to make some progress and looking at > the code in question I can probably comment on some of it now. Basically = I > don't think your concerns wrt DELETE reliability/gurantees have much to d= o > with this particular feature. The reason being that BIO_DELETE essentiall= y > tells the storage layer that whichever code "owns" the block in question > (e.g. ZFS or UFS) has moved it into the free pool and will NEVER ever wan= t > to read its value back again (until it's written into again). > Not quite true. It merely means the contents are no longer interesting. It doesn't mean they won't be read again. In FreeBSD BIO_DELETE has no post-condition always-true semantics. It might read back 0's. It might read back 1's. It might read back the data that was there before. All of these are allowed by various standards. FreeBSD even allows it to read back random data, though I'm not aware of any standards conforming hardware that would act that way. > Therefore, whatever geom driver does with that BIO_DELETE request truly > has no real effects on the file system operations, at least in theory, ev= en > if it effectively a NOP. > The filesystem cannot count on what happens with BIO_DELETE, or even if the underlying device supports it. > Technically speaking on 100% correctly working os/hardware attempt to rea= d > block after it's been successfully BIO_DELETE'd could produce exception o= f > some sort without any ill effects. > There's no basis for this assertion. BIO_DELETE doesn't destroy the LBA range. It merely advises the lower layers that the LBA range is no longer precious and can be discarded. Some lower layers implement this with a guarantee that it will read 0's or 1's until rewritten. Some say 'thank you for the hint' and provide no additional guarantees. > In reality, however, this is probably not a good idea to enforce that too > strictly except for debug/testing purposes. As for the alignment etc, in > this particular case of VOP_ALLOCATE(FALLOC_FL_PUNCH_HOLE), a filesystem = in > question is responsible for making sure the range that has been punched > through reads 0, whether by making real logical hole in the file and/or b= y > padding it with zeroes as needed. I've tested it with ZFS and it correctl= y > works on any range sizes/offsets, even when they aren't multiple of block > size. > At the filesystem level, it doesn't matter what happens in the main storage. Blocks that are no longer needed are returned to the free pool of the filesystem. Partial blocks are read -modifed- written for the appropriate ranges of zeros. A conforming implementation could even fail to return the blocks that were no longer needed and write zeros to them if it wanted. But what it does is filesystem dependent. If it does return them to the free pool, it could, if it wanted, issue a BIO_DELETE to the lower layers, assuming they have indicated they support it, and will have to deal with any error from doing so. Warner