Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Feb 2024 08:18:48 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        cglogic <cglogic@protonmail.com>
Cc:        Andriy Gapon <avg@freebsd.org>, src-committers <src-committers@freebsd.org>,  "<dev-commits-src-all@freebsd.org>" <dev-commits-src-all@freebsd.org>,  "<dev-commits-src-main@freebsd.org>" <dev-commits-src-main@freebsd.org>
Subject:   Re: git: c01af41c3c8f - main - ata_da: add quirk to disable NCQ TRIM for Samsung 860/870 SSDs
Message-ID:  <CANCZdfpWTsWW538kUC8bh6UWxy1W%2BJ8BcP0SSPF5zWx%2BPrHUAw@mail.gmail.com>
In-Reply-To: <xQZ9rTt07vwgLknVqeC77uOk_rFVITpbrx2Omrn6FfUwhP4M64D0jJ6nmjvg51l_Up3mnJtrXxxcYiuTfU8t6DpHlMRsceuXoIGY1-zhBaI=@protonmail.com>
References:  <fd15930c-4f47-4ed5-9136-ca507a881218@FreeBSD.org> <xQZ9rTt07vwgLknVqeC77uOk_rFVITpbrx2Omrn6FfUwhP4M64D0jJ6nmjvg51l_Up3mnJtrXxxcYiuTfU8t6DpHlMRsceuXoIGY1-zhBaI=@protonmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000067d1ca0611bd9cb8
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 19, 2024, 4:29=E2=80=AFAM cglogic <cglogic@protonmail.com> wrot=
e:

> On Monday, February 19th, 2024 at 1:03 PM, Andriy Gapon <avg@FreeBSD.org>
> wrote:
>
> > On 19/02/2024 12:47, cglogic wrote:
> >
> > > Hello Andriy,
> > >
> > > I use ZFS with autotrim enabled on Samsung 860 PRO connected to Intel
> AHCI SATA controller for 9 years without any issue.
> >
> >
> > I think that it was released in 2018?
>
> You are right, it's 9 years old computer, SSD was added later.
>
> >
> > > Can I disable this quirk locally or have I revert the patch to
> continue use NCQ TRIM with this SSD?
> >
> >
> > I don't think that there is a way to disable the quirk, so you'll have
> to revert.
> > Note that ATA TRIM is still enabled, it's only NCQ TRIM that got
> disabled.
>
> From my understanding ATA TRIM is non-queued TRIM as opposed to NCQ TRIM.
> With only ATA TRIM enabled, drive will have increased latency when large
> amount of files deleted.
>
> However I'm not sure how ZFS handles TRIMs, maybe it groups several TRIMs
> and then sends when drive is idle.
> Another question is how this affects TRIM enabled swap partition.
>
> In general, if I'm correct here, disabling NCQ TRIM should not be an issu=
e
> for desktop usage, except for large git repository updates, etc.
>

In general, it won't matter a ton. Unless you are seeing stalls because of
read/write I/O waiting behind trims that have to go to the drive one at a
time, the effects are in the noise. If you are seeing that, though, the
effects can be quite helpful in terms of latency... assuming the drive is
cooperative. The benefits of it vary a lot from drive to drive as well.
Some early drives still serialize the trims, so the latency benefit is not
so great. Some enterprise drives do trims instantly and keep some internal
state that is helped a lot by them. But if NCQ trims are unstable, we
likely should err on the side of disabling.

The errors that Andriy was seeing typically are cable or controller issues.
In his case, it was a controller issue (apparently, and surprisingly,
paired with a specific type of drive). Linux has a specific workaround for
the pair, which is unusual. But there's two issues at play.

The first issue is that these drives should have NCQ Trim turned off,
according to Linux's commit log. This may be firmware revision based or may
just be some people are luckier with newer firmware (it's hard to say from
the mixed reports, but they read more like luck than version for the
sampling I did). So Andriy's change is good. The second issue, and one I
was confused on earlier, is that Linux goes a step further and disables NCQ
entirely for these old controllers for 860 and 870 drives. This isn't too
surprising. We have a workaround for PMP for these controllers already....
One can disable NCQ entirely by setting the tags to 0... Given the
specificity of the problem, and the age of the controllers, I'm inclined to
not implement the Linux workaround for this specific pair.

Warner

>
> > > On Monday, February 19th, 2024 at 12:09 PM, Andriy Gapon
> avg@FreeBSD.org wrote:
> > >
> > > > The branch main has been updated by avg:
> > > >
> > > > URL:
> https://cgit.FreeBSD.org/src/commit/?id=3Dc01af41c3c8fdd570764ff9b6bfbad6=
ac9ca1664
> > > >
> > > > commit c01af41c3c8fdd570764ff9b6bfbad6ac9ca1664
> > > > Author: Andriy Gapon avg@FreeBSD.org
> > > >
> > > > AuthorDate: 2024-02-19 10:08:12 +0000
> > > > Commit: Andriy Gapon avg@FreeBSD.org
> > > >
> > > > CommitDate: 2024-02-19 10:08:12 +0000
> > > >
> > > > ata_da: add quirk to disable NCQ TRIM for Samsung 860/870 SSDs
> > > >
> > > > NCQ TRIM for Samsung 860/870 SSDs results in data corruption on
> systems
> > > > with some SATA controllers.
> > > >
> > > > This can be easily reproduced using ZFS which uses TRIM and is able
> to
> > > > detect block content changes.
> > > >
> > > > Linux bug report for this issue:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=3D201693
> > > >
> > > > Since at present we can not limit a quirk based on the contorller /
> SIM,
> > > > apply the quirk in all cases.
> > > >
> > > > Reviewed by: imp
> > > > MFC after: 2 weeks
> > > > Differential Revision: https://reviews.freebsd.org/D43961
> > > > ---
> > > > sys/cam/ata/ata_da.c | 16 ++++++++++++++++
> > > > 1 file changed, 16 insertions(+)
> > > >
> > > > diff --git a/sys/cam/ata/ata_da.c b/sys/cam/ata/ata_da.c
> > > > index f5d3aeca9329..d4a591943307 100644
> > > > --- a/sys/cam/ata/ata_da.c
> > > > +++ b/sys/cam/ata/ata_da.c
> > > > @@ -727,6 +727,22 @@ static struct ada_quirk_entry ada_quirk_table[=
]
> =3D
> > > > { T_DIRECT, SIP_MEDIA_FIXED, "", "Samsung SSD 850", "" },
> > > > /quirks/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN
> > > > },
> > > > + {
> > > > + /
> > > > + * Samsung 860 SSDs
> > > > + * 4k optimised, NCQ TRIM broken (normal TRIM fine)
> > > > + /
> > > > + { T_DIRECT, SIP_MEDIA_FIXED, "", "Samsung SSD 860*", "" },
> > > > + /quirks/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN
> > > > + },
> > > > + {
> > > > + /
> > > > + * Samsung 870 SSDs
> > > > + * 4k optimised, NCQ TRIM broken (normal TRIM fine)
> > > > + /
> > > > + { T_DIRECT, SIP_MEDIA_FIXED, "", "Samsung SSD 870*", "" },
> > > > + /quirks/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN
> > > > + },
> > > > {
> > > > /
> > > > * Samsung SM863 Series SSDs (MZ7KM*)
> >
> >
> > --
> > Andriy Gapon
>

--00000000000067d1ca0611bd9cb8
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"auto"><div><br><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Mon, Feb 19, 2024, 4:29=E2=80=AFAM=
 cglogic &lt;<a href=3D"mailto:cglogic@protonmail.com" target=3D"_blank">cg=
logic@protonmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>On Monday, February 19th, 2024 at 1:03 PM, Andriy Gapon &lt;avg@FreeBSD.or=
g&gt; wrote:<br>
<br>
&gt; On 19/02/2024 12:47, cglogic wrote:<br>
&gt; <br>
&gt; &gt; Hello Andriy,<br>
&gt; &gt; <br>
&gt; &gt; I use ZFS with autotrim enabled on Samsung 860 PRO connected to I=
ntel AHCI SATA controller for 9 years without any issue.<br>
&gt; <br>
&gt; <br>
&gt; I think that it was released in 2018?<br>
<br>
You are right, it&#39;s 9 years old computer, SSD was added later.<br>
<br>
&gt; <br>
&gt; &gt; Can I disable this quirk locally or have I revert the patch to co=
ntinue use NCQ TRIM with this SSD?<br>
&gt; <br>
&gt; <br>
&gt; I don&#39;t think that there is a way to disable the quirk, so you&#39=
;ll have to revert.<br>
&gt; Note that ATA TRIM is still enabled, it&#39;s only NCQ TRIM that got d=
isabled.<br>
<br>
>From my understanding ATA TRIM is non-queued TRIM as opposed to NCQ TRIM.<b=
r>
With only ATA TRIM enabled, drive will have increased latency when large am=
ount of files deleted.<br>
<br>
However I&#39;m not sure how ZFS handles TRIMs, maybe it groups several TRI=
Ms and then sends when drive is idle.<br>
Another question is how this affects TRIM enabled swap partition.<br>
<br>
In general, if I&#39;m correct here, disabling NCQ TRIM should not be an is=
sue for desktop usage, except for large git repository updates, etc.<br></b=
lockquote></div></div><div dir=3D"auto"><br></div>In general, it won&#39;t =
matter a ton. Unless you are seeing stalls because of read/write I/O waitin=
g behind trims that have to go to the drive one at a time, the effects are =
in the noise. If you are seeing that, though, the effects can be quite help=
ful in terms of latency... assuming the drive is cooperative. The benefits =
of it vary a lot from drive to drive as well. Some early drives still seria=
lize the trims, so the latency benefit is not so great. Some enterprise dri=
ves do trims instantly and keep some internal state that is helped a lot by=
 them. But if NCQ trims are unstable, we likely should err on the side of d=
isabling.<br></div><div dir=3D"auto"><br></div><div>The errors that Andriy =
was seeing typically are cable or controller issues. In his case, it was a =
controller issue (apparently, and surprisingly, paired with a specific type=
 of drive). Linux has a specific workaround for the pair, which is unusual.=
 But there&#39;s two issues at play.<br></div><div><br></div><div>The first=
 issue is that these drives should have NCQ Trim turned off, according to L=
inux&#39;s commit log. This may be firmware revision based or may just be s=
ome people are luckier with newer firmware (it&#39;s hard to say from the m=
ixed reports, but they read more like luck than version for the sampling I =
did). So Andriy&#39;s change is good. The second issue, and one I was confu=
sed on earlier, is that Linux goes a step further and disables NCQ entirely=
 for these old controllers for 860 and 870 drives. This isn&#39;t too surpr=
ising. We have a workaround for PMP for these controllers already.... One c=
an disable NCQ entirely by setting the tags to 0... Given the specificity o=
f the problem, and the age of the controllers, I&#39;m inclined to not impl=
ement the Linux workaround for this specific pair.<br></div><div><br></div>=
<div><div>Warner<br></div><div dir=3D"auto"><br></div><div dir=3D"auto"><di=
v dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote"=
 style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
&gt; <br>
&gt; &gt; On Monday, February 19th, 2024 at 12:09 PM, Andriy Gapon avg@Free=
BSD.org wrote:<br>
&gt; &gt; <br>
&gt; &gt; &gt; The branch main has been updated by avg:<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3Dc0=
1af41c3c8fdd570764ff9b6bfbad6ac9ca1664" rel=3D"noreferrer noreferrer" targe=
t=3D"_blank">https://cgit.FreeBSD.org/src/commit/?id=3Dc01af41c3c8fdd570764=
ff9b6bfbad6ac9ca1664</a><br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; commit c01af41c3c8fdd570764ff9b6bfbad6ac9ca1664<br>
&gt; &gt; &gt; Author: Andriy Gapon avg@FreeBSD.org<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; AuthorDate: 2024-02-19 10:08:12 +0000<br>
&gt; &gt; &gt; Commit: Andriy Gapon avg@FreeBSD.org<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; CommitDate: 2024-02-19 10:08:12 +0000<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; ata_da: add quirk to disable NCQ TRIM for Samsung 860/870 SS=
Ds<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; NCQ TRIM for Samsung 860/870 SSDs results in data corruption=
 on systems<br>
&gt; &gt; &gt; with some SATA controllers.<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; This can be easily reproduced using ZFS which uses TRIM and =
is able to<br>
&gt; &gt; &gt; detect block content changes.<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; Linux bug report for this issue:<br>
&gt; &gt; &gt; <a href=3D"https://bugzilla.kernel.org/show_bug.cgi?id=3D201=
693" rel=3D"noreferrer noreferrer" target=3D"_blank">https://bugzilla.kerne=
l.org/show_bug.cgi?id=3D201693</a><br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; Since at present we can not limit a quirk based on the conto=
rller / SIM,<br>
&gt; &gt; &gt; apply the quirk in all cases.<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; Reviewed by: imp<br>
&gt; &gt; &gt; MFC after: 2 weeks<br>
&gt; &gt; &gt; Differential Revision: <a href=3D"https://reviews.freebsd.or=
g/D43961" rel=3D"noreferrer noreferrer" target=3D"_blank">https://reviews.f=
reebsd.org/D43961</a><br>
&gt; &gt; &gt; ---<br>
&gt; &gt; &gt; sys/cam/ata/ata_da.c | 16 ++++++++++++++++<br>
&gt; &gt; &gt; 1 file changed, 16 insertions(+)<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; diff --git a/sys/cam/ata/ata_da.c b/sys/cam/ata/ata_da.c<br>
&gt; &gt; &gt; index f5d3aeca9329..d4a591943307 100644<br>
&gt; &gt; &gt; --- a/sys/cam/ata/ata_da.c<br>
&gt; &gt; &gt; +++ b/sys/cam/ata/ata_da.c<br>
&gt; &gt; &gt; @@ -727,6 +727,22 @@ static struct ada_quirk_entry ada_quirk=
_table[] =3D<br>
&gt; &gt; &gt; { T_DIRECT, SIP_MEDIA_FIXED, &quot;&quot;, &quot;Samsung SSD=
 850&quot;, &quot;&quot; },<br>
&gt; &gt; &gt; /quirks/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN<br>
&gt; &gt; &gt; },<br>
&gt; &gt; &gt; + {<br>
&gt; &gt; &gt; + /<br>
&gt; &gt; &gt; + * Samsung 860 SSDs<br>
&gt; &gt; &gt; + * 4k optimised, NCQ TRIM broken (normal TRIM fine)<br>
&gt; &gt; &gt; + /<br>
&gt; &gt; &gt; + { T_DIRECT, SIP_MEDIA_FIXED, &quot;&quot;, &quot;Samsung S=
SD 860*&quot;, &quot;&quot; },<br>
&gt; &gt; &gt; + /quirks/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN<br>
&gt; &gt; &gt; + },<br>
&gt; &gt; &gt; + {<br>
&gt; &gt; &gt; + /<br>
&gt; &gt; &gt; + * Samsung 870 SSDs<br>
&gt; &gt; &gt; + * 4k optimised, NCQ TRIM broken (normal TRIM fine)<br>
&gt; &gt; &gt; + /<br>
&gt; &gt; &gt; + { T_DIRECT, SIP_MEDIA_FIXED, &quot;&quot;, &quot;Samsung S=
SD 870*&quot;, &quot;&quot; },<br>
&gt; &gt; &gt; + /quirks/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN<br>
&gt; &gt; &gt; + },<br>
&gt; &gt; &gt; {<br>
&gt; &gt; &gt; /<br>
&gt; &gt; &gt; * Samsung SM863 Series SSDs (MZ7KM*)<br>
&gt; <br>
&gt; <br>
&gt; --<br>
&gt; Andriy Gapon<br>
</blockquote></div></div></div>
</div></div>

--00000000000067d1ca0611bd9cb8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpWTsWW538kUC8bh6UWxy1W%2BJ8BcP0SSPF5zWx%2BPrHUAw>