Date: Tue, 8 Dec 2015 12:03:11 -0700 From: Warner Losh <imp@bsdimp.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-hackers@freebsd.org Subject: Re: DELETE support in the VOP_STRATEGY(9)? Message-ID: <0DB97CBA-4DC3-4D52-AE9D-54546292D66F@bsdimp.com> In-Reply-To: <566726ED.2010709@multiplay.co.uk> References: <CAH7qZftSVAYPmxNCQy=VVRj79AW7z9ade-0iogv2COfo2x%2Ba2Q@mail.gmail.com> <201512052002.tB5K2ZEA026540@chez.mckusick.com> <CAH7qZfs6ksE%2BQTMFFLYxY0PNE4hzn=D5skzQ91=gGK2xvndkfw@mail.gmail.com> <86poyhqsdh.fsf@desk.des.no> <CAH7qZftVj9m_yob=AbAQA7fh8yG-VLgM7H0skW3eX_S%2Bv75E-g@mail.gmail.com> <86fuzdqjwn.fsf@desk.des.no> <CANCZdfo=NfKy51%2B64-F_v%2BDh2wkrFYP4gXe=X9RWSSao49gO9g@mail.gmail.com> <CANCZdfqHoduhdCss0b6=UsBPAxfRZv4hF8vyuUVLBdP5gYUduQ@mail.gmail.com> <864mfssxgt.fsf@desk.des.no> <CANCZdfoXdcD%2B9jeVR1Np16gafBf0_4B2wombwxze8DvJwf7cMg@mail.gmail.com> <86wpsord9l.fsf@desk.des.no> <566726ED.2010709@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_EC74369D-41B2-4A4A-B350-17C3E63CCCCA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Dec 8, 2015, at 11:52 AM, Steven Hartland <killing@multiplay.co.uk> = wrote: >=20 >=20 >=20 > On 08/12/2015 18:44, Dag-Erling Sm=C3=B8rgrav wrote: >> Warner Losh <imp@bsdimp.com> writes: >>> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes: >>>> But the filesystem does not know whether the underlying storage is >>>> electromechanical or solid-state, nor does it know whether the user >>>> cares much about seek times (unless we introduce the heuristic >>>> "avoid creating holes unless the file already has them, in which >>>> case the userland probably does not care"). >>> Actually, the filesystem does know. Or has some knowledge of what >>> is supported and what isn't. BIO_DELETE support is a strong = indicator >>> of a flash or other log-type system. >> The filesystem can ask the layer below if BIO_DELETE is supported, = but >> should not assume anything about what it means. For instance, I = could >> write a gnop-like module that translates BIO_DELETE into an = all-zeroes >> BIO_WRITE and passes everything else unmodified. It would provide a >> stronger guarantee than, say, SATA TRIM but would also have a = completely >> different performance profile (even on SSDs, since it would do its = work >> synchronously whereas TRIM works asynchronously). That ship has sailed. UFS, at least, assumes that if TRIM is supported = then relocating files to be contiguous is bad. But writing a gnop module that did the BIO_DELETE thing would be bogus. BIO_DELETE does not mean that blocks will read back as zeros. But = that=E2=80=99s not what BIO_DELETE means. So, sure you could invent a stupid thing that breaks the rules, and thus the assumptions of the other code, but why = would you want to do that? The SATA trims are actually synchronous (in the absence of power = failures). Once you TRIM The data, it is gone. And depending what bits are set in the identify response, you can count on different things. But to say = they happen asynchronously because of implementation details about when the = data is actually erased is missing the point. Also, your BIO_DELETE example wouldn=E2=80=99t guarantee the data is erased either. Writes to log = append devices (like SSDs) are like a TRIM followed by a write: the old LBA mapping is discarded and a new one replaces it. >> Anyway, my point is that Maxim needs to revise his assumptions. > Just to clarify most consumer devices process TRIM synchronously, not = asynchronously. It also depends on what you mean by =E2=80=98process=E2=80=99 here. > Your example isn't actually just an example CAM scsi_da has a number = of different ways it can process BIO_DELETE: > * ATA TRIM > * SCSI UMAP > * Write Same 16 > * Write Same 10 > * Zero >=20 > So you example is actually exists in practice in the FreeBSD code base = ;-) All these are effectively TRIM operations. The devices that implement = them use them as hints to optimize storage. DES=E2=80=99 BIO_DELETE -> WRITE = zero example doesn=E2=80=99t optimize storage at all, nor does it give the = lower layers any clue about how to optimize the storage. All the SCSI delete types do give that hint. Warner --Apple-Mail=_EC74369D-41B2-4A4A-B350-17C3E63CCCCA Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJWZylxAAoJEGwc0Sh9sBEAQVsQAMmXAMbT4ujIa1fwJHBI0NKK Mk/v2egPRzkqzp6WrXYFaK1h5XOpjFtnJt2R3SD9PbVdz80M3CHVhb0Qv8XaXWVr z6KYLaQpcrMCPe1Gdeym0gVNvS1loUTsotRisVAiW1EhbJnsx+0Wl7ad8O8bwx2+ 6zGX6hz+Kpcy2vHzzubRoJaNoRJnm8lY/lP+qsRdZWNsROdCwjCj/qwUDeGr25xA Z0/MS+NZrPoyEuPdr5vjCrChyK/mzl5+mv3GJWO3OP+JsAcTOzhSxEr0nMtEcXyN zHKeo/k79UcIPnXOd6bg2UM0/P+9+m/VkE3rCLF/MVY5wt2O6PiMDEblpbVGQCid 2q3MTBtQ3DDJ0IW6TveLlrfvk6XnlyO5NX+Th0sgB2HASCc6OmyyC2gu8T+gB37Q 2UsOU1pJvx4XBfE7cvPMpj0T40ax88zPWNamaTSyW5AhxZI8rGPst8BF3hCzHZqX yULRAlLn5lE1yeIUGvV86VLP+xFFnoRf7o8TB6uL6wIQ/O+s34EdPzQzmKpFVJMO MKr1dffdZLKtWsFnUY8yUXExhLxUR1W+zwRbNwetqwmgy6VJ4btu/ZES91t+wmVu gnDMGMaYv2HHhQHGZMicN3H60ywGIVLpA4PxLPK7s9vojMY6u9NNx/Bp3LRqkQ97 Vli8lviRtFAtZbNEr+4r =flRM -----END PGP SIGNATURE----- --Apple-Mail=_EC74369D-41B2-4A4A-B350-17C3E63CCCCA--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0DB97CBA-4DC3-4D52-AE9D-54546292D66F>