Date: Tue, 8 Dec 2015 08:44:56 -0700 From: Warner Losh <imp@bsdimp.com> To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Fwd: DELETE support in the VOP_STRATEGY(9)? Message-ID: <CANCZdfrF9OfMKiMGwGS5SStGeq01f_WdnHY26sRLgc69n0pkKQ@mail.gmail.com> In-Reply-To: <CANCZdfpvUOFFvMdnCuBBJSh3tAPYeqCtbMomSBtC-GpGBNo%2BDw@mail.gmail.com> References: <CAH7qZftSVAYPmxNCQy=VVRj79AW7z9ade-0iogv2COfo2x%2Ba2Q@mail.gmail.com> <201512052002.tB5K2ZEA026540@chez.mckusick.com> <CAH7qZfs6ksE%2BQTMFFLYxY0PNE4hzn=D5skzQ91=gGK2xvndkfw@mail.gmail.com> <86poyhqsdh.fsf@desk.des.no> <CAH7qZftVj9m_yob=AbAQA7fh8yG-VLgM7H0skW3eX_S%2Bv75E-g@mail.gmail.com> <CANCZdfpvUOFFvMdnCuBBJSh3tAPYeqCtbMomSBtC-GpGBNo%2BDw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[ forgot to cc hackers ] ---------- Forwarded message ---------- From: Warner Losh <imp@bsdimp.com> Date: Tue, Dec 8, 2015 at 8:22 AM Subject: Re: DELETE support in the VOP_STRATEGY(9)? To: Maxim Sobolev <sobomax@freebsd.org> Cc: Dag-Erling Sm=C3=B8rgrav <des@des.no>, Kirk McKusick <mckusick@mckusick= .com>, Pawel Jakub Dawidek <pjd@freebsd.org> On Tue, Dec 8, 2015 at 1:53 AM, Maxim Sobolev <sobomax@freebsd.org> wrote: > 1. There was a complain from list admin about this being off-topic. I > think I've collected enough input from the interested parties so that I c= an > work on my patch before I feel like posting it publicly on some more > appropriate tech list for a wider review. > > 2. I did not really ignore it, it just that I did not have much to reply > at that point. But after being able to make some progress and looking at > the code in question I can probably comment on some of it now. Basically = I > don't think your concerns wrt DELETE reliability/gurantees have much to d= o > with this particular feature. The reason being that BIO_DELETE essentiall= y > tells the storage layer that whichever code "owns" the block in question > (e.g. ZFS or UFS) has moved it into the free pool and will NEVER ever wan= t > to read its value back again (until it's written into again). > Not quite true. It merely means the contents are no longer interesting. It doesn't mean they won't be read again. In FreeBSD BIO_DELETE has no post-condition always-true semantics. It might read back 0's. It might read back 1's. It might read back the data that was there before. All of these are allowed by various standards. FreeBSD even allows it to read back random data, though I'm not aware of any standards conforming hardware that would act that way. > Therefore, whatever geom driver does with that BIO_DELETE request truly > has no real effects on the file system operations, at least in theory, ev= en > if it effectively a NOP. > The filesystem cannot count on what happens with BIO_DELETE, or even if the underlying device supports it. > Technically speaking on 100% correctly working os/hardware attempt to rea= d > block after it's been successfully BIO_DELETE'd could produce exception o= f > some sort without any ill effects. > There's no basis for this assertion. BIO_DELETE doesn't destroy the LBA range. It merely advises the lower layers that the LBA range is no longer precious and can be discarded. Some lower layers implement this with a guarantee that it will read 0's or 1's until rewritten. Some say 'thank you for the hint' and provide no additional guarantees. > In reality, however, this is probably not a good idea to enforce that too > strictly except for debug/testing purposes. As for the alignment etc, in > this particular case of VOP_ALLOCATE(FALLOC_FL_PUNCH_HOLE), a filesystem = in > question is responsible for making sure the range that has been punched > through reads 0, whether by making real logical hole in the file and/or b= y > padding it with zeroes as needed. I've tested it with ZFS and it correctl= y > works on any range sizes/offsets, even when they aren't multiple of block > size. > At the filesystem level, it doesn't matter what happens in the main storage. Blocks that are no longer needed are returned to the free pool of the filesystem. Partial blocks are read -modifed- written for the appropriate ranges of zeros. A conforming implementation could even fail to return the blocks that were no longer needed and write zeros to them if it wanted. But what it does is filesystem dependent. If it does return them to the free pool, it could, if it wanted, issue a BIO_DELETE to the lower layers, assuming they have indicated they support it, and will have to deal with any error from doing so. Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrF9OfMKiMGwGS5SStGeq01f_WdnHY26sRLgc69n0pkKQ>