Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Dec 2015 08:44:56 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Fwd: DELETE support in the VOP_STRATEGY(9)?
Message-ID:  <CANCZdfrF9OfMKiMGwGS5SStGeq01f_WdnHY26sRLgc69n0pkKQ@mail.gmail.com>
In-Reply-To: <CANCZdfpvUOFFvMdnCuBBJSh3tAPYeqCtbMomSBtC-GpGBNo%2BDw@mail.gmail.com>
References:  <CAH7qZftSVAYPmxNCQy=VVRj79AW7z9ade-0iogv2COfo2x%2Ba2Q@mail.gmail.com> <201512052002.tB5K2ZEA026540@chez.mckusick.com> <CAH7qZfs6ksE%2BQTMFFLYxY0PNE4hzn=D5skzQ91=gGK2xvndkfw@mail.gmail.com> <86poyhqsdh.fsf@desk.des.no> <CAH7qZftVj9m_yob=AbAQA7fh8yG-VLgM7H0skW3eX_S%2Bv75E-g@mail.gmail.com> <CANCZdfpvUOFFvMdnCuBBJSh3tAPYeqCtbMomSBtC-GpGBNo%2BDw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[ forgot to cc hackers ]
---------- Forwarded message ----------
From: Warner Losh <imp@bsdimp.com>
Date: Tue, Dec 8, 2015 at 8:22 AM
Subject: Re: DELETE support in the VOP_STRATEGY(9)?
To: Maxim Sobolev <sobomax@freebsd.org>
Cc: Dag-Erling Sm=C3=B8rgrav <des@des.no>, Kirk McKusick <mckusick@mckusick=
.com>,
Pawel Jakub Dawidek <pjd@freebsd.org>




On Tue, Dec 8, 2015 at 1:53 AM, Maxim Sobolev <sobomax@freebsd.org> wrote:

> 1. There was a complain from list admin about this being off-topic.  I
> think I've collected enough input from the interested parties so that I c=
an
> work on my patch before I feel like posting it publicly on some more
> appropriate tech list for a wider review.
>
> 2. I did not really ignore it, it just that I did not have much to reply
> at that point. But after being able to make some progress and looking at
> the code in question I can probably comment on some of it now. Basically =
I
> don't think your concerns wrt DELETE reliability/gurantees have much to d=
o
> with this particular feature. The reason being that BIO_DELETE essentiall=
y
> tells the storage layer that whichever code "owns" the block in question
> (e.g. ZFS or UFS) has moved it into the free pool and will NEVER ever wan=
t
> to read its value back again (until it's written into again).
>

Not quite true. It merely means the contents are no longer interesting. It
doesn't mean they won't be read again.

In FreeBSD BIO_DELETE has no post-condition always-true semantics. It might
read back 0's. It might read back 1's. It might read back the data that was
there before. All of these are allowed by various standards. FreeBSD even
allows it to read back random data, though I'm not aware of any standards
conforming hardware that would act that way.


> Therefore, whatever geom driver does with that BIO_DELETE request truly
> has no real effects on the file system operations, at least in theory, ev=
en
> if it effectively a NOP.
>

The filesystem cannot count on what happens with BIO_DELETE, or even if the
underlying device supports it.


> Technically speaking on 100% correctly working os/hardware attempt to rea=
d
> block after it's been successfully BIO_DELETE'd could produce exception o=
f
> some sort without any ill effects.
>

There's no basis for this assertion. BIO_DELETE doesn't destroy the LBA
range. It merely advises the lower layers that the LBA range is no longer
precious and can be discarded. Some lower layers implement this with a
guarantee that it will read 0's or 1's until rewritten. Some say 'thank you
for the hint' and provide no additional guarantees.


> In reality, however, this is probably not a good idea to enforce that too
> strictly except for debug/testing purposes. As for the alignment etc, in
> this particular case of VOP_ALLOCATE(FALLOC_FL_PUNCH_HOLE), a filesystem =
in
> question is responsible for making sure the range that has been punched
> through reads 0, whether by making real logical hole in the file and/or b=
y
> padding it with zeroes as needed. I've tested it with ZFS and it correctl=
y
> works on any range sizes/offsets, even when they aren't multiple of block
> size.
>

At the filesystem level, it doesn't matter what happens in the main
storage. Blocks that are no longer needed are returned to the free pool of
the filesystem. Partial blocks are read -modifed- written for the
appropriate ranges of zeros. A conforming implementation could even fail to
return the blocks that were no longer needed and write zeros to them if it
wanted. But what it does is filesystem dependent. If it does return them to
the free pool, it could, if it wanted, issue a BIO_DELETE to the lower
layers, assuming they have indicated they support it, and will have to deal
with any error from doing so.

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrF9OfMKiMGwGS5SStGeq01f_WdnHY26sRLgc69n0pkKQ>