Date: Tue, 8 Dec 2015 19:54:14 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Warner Losh <imp@bsdimp.com> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: Fwd: DELETE support in the VOP_STRATEGY(9)? Message-ID: <20151208175414.GB82577@kib.kiev.ua> In-Reply-To: <CANCZdfqHoduhdCss0b6=UsBPAxfRZv4hF8vyuUVLBdP5gYUduQ@mail.gmail.com> References: <CAH7qZftSVAYPmxNCQy=VVRj79AW7z9ade-0iogv2COfo2x%2Ba2Q@mail.gmail.com> <201512052002.tB5K2ZEA026540@chez.mckusick.com> <CAH7qZfs6ksE%2BQTMFFLYxY0PNE4hzn=D5skzQ91=gGK2xvndkfw@mail.gmail.com> <86poyhqsdh.fsf@desk.des.no> <CAH7qZftVj9m_yob=AbAQA7fh8yG-VLgM7H0skW3eX_S%2Bv75E-g@mail.gmail.com> <86fuzdqjwn.fsf@desk.des.no> <CANCZdfo=NfKy51%2B64-F_v%2BDh2wkrFYP4gXe=X9RWSSao49gO9g@mail.gmail.com> <CANCZdfqHoduhdCss0b6=UsBPAxfRZv4hF8vyuUVLBdP5gYUduQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 08, 2015 at 08:43:33AM -0700, Warner Losh wrote: > [ forgot to cc hackers ] > ---------- Forwarded message ---------- > From: Warner Losh <imp@bsdimp.com> > Date: Tue, Dec 8, 2015 at 8:41 AM > Subject: Re: DELETE support in the VOP_STRATEGY(9)? > To: Dag-Erling Sm??rgrav <des@des.no> > > > On Tue, Dec 8, 2015 at 4:06 AM, Dag-Erling Sm??rgrav <des@des.no> wrote: > > > Maxim Sobolev <sobomax@FreeBSD.org> writes: > > > Dag-Erling Sm??rgrav <des@des.no> writes: > > > > 1) why did you take this off the list? > > > There was a complain from list admin about this being off-topic. > > > > Yes, and Eitan moved the discussion to hackers@. It should have stayed > > there. > > > > > > 2) why did you even bother to cc: me if you were going to competely > > > > ignore everything I said anyway? > > > I did not really ignore it, it just that I did not have much to reply > > > at that point. [...] Basically I don't think your concerns wrt DELETE > > > reliability/gurantees have much to do with this particular feature. > > > The reason being that BIO_DELETE essentially tells the storage layer > > > that whichever code "owns" the block in question (e.g. ZFS or UFS) > > > has moved it into the free pool and will NEVER ever want to read its > > > value back again (until it's written into again). > > > > No, it means that the contents of that block are no longer important and > > that the lower layers *may* reclaim it. It does not mean that nobody > > will ever try to read the block, nor does it guarantee that the block > > will actually be reclaimed or zeroed. We cannot rely on the lower > > layers to ensure that reading from a previously deleted block never > > returns data that may have belonged to a different file. > > > > BTW, I've encountered CF cards (including the SanDisk card in my home > > router) that freeze if issued a TRIM command. Furthermore, many CF, MMC > > and SD cards, especially those marketed for use in digital cameras, > > perform wear leveling "automagically" based on their own understanding > > of the filesystem layout, and will therefore work poorly with anything > > other than FAT (Kingston call it "optimized recording performance" in > > their marketing literature). > > > While these issues are relevant for BIO_DELETE, they aren't so much relevant > for punching a hole in a file in a filesystem. The filesystem is the one > that > gets to decide whether and when to issue a BIO_DELETE (just as the lower > layers get to decide what to do). A properly written filesystem will not > issue > a BIO_DELETE and then assume it will read back 0's. The whole point of > the punch hole is to allow the filesystem to return the blocks to its free > store. If that also happens to have the effect of causing a BIO_DELETE > to go down, that's no different than deleting the file and having a > BIO_DELETE > go down for the resulting blocks that are freed. Exactly. I completely agree with the statement above, and think that this is the only thing that should be implemented. There could be a request, most likely an filesystem-level ioctl, which punches holes in the file. Its effect on the file state should be the same as if the seek was done between writes, if the filesystem supports the ioctl. More, if supported, the lseek(SEEK_HOLE/SEEK_DATA) behaviour should be consistent with the request. The later might mean that we should restrict the interface to only accept ranges at block boundaries. Note that for UFS, it is automatically (due to the implementation at the lower levels) that BIO_DELETE might be issued for the fully freed blocks, when it is supported by the volume and safe from the PoV of the filesystem safety. In other words, the behaviour there should be the same as for the blocks freed due to the file truncation or final unlinking. UFS has a known quirk there, it does not allow hole to extend to the end of file, the last fragment or block must be allocated. I once did patched kernel and fsck to remove this restriction, but I probably lost the patch.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151208175414.GB82577>