Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Sep 2023 16:21:28 +0100
From:      Warner Losh <imp@bsdimp.com>
To:        Warner Losh <imp@freebsd.org>
Cc:        src-committers@freebsd.org, dev-commits-src-all@freebsd.org,  dev-commits-src-main@freebsd.org
Subject:   Re: git: d95431624f93 - main - nvme: Give up when we've failed
Message-ID:  <CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ@mail.gmail.com>
In-Reply-To: <202309171516.38HFGT2g089933@gitrepo.freebsd.org>
References:  <202309171516.38HFGT2g089933@gitrepo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000001030d506058f954c
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, Sep 17, 2023 at 4:16=E2=80=AFPM Warner Losh <imp@freebsd.org> wrote=
:

> The branch main has been updated by imp:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=3Dd95431624f934fe4740211738fc7878=
08005b14e
>
> commit d95431624f934fe4740211738fc787808005b14e
> Author:     Warner Losh <imp@FreeBSD.org>
> AuthorDate: 2023-09-15 16:02:32 +0000
> Commit:     Warner Losh <imp@FreeBSD.org>
> CommitDate: 2023-09-17 15:11:56 +0000
>
>     nvme: Give up when we've failed
>
>     Normally, we poll the device every so often to see if commands have
>     timed out. However, we'll go into the recovery state as part of faili=
ng
>     the drive. To account for all possibilties, if we're failed when we g=
et
>     into the polling function, just stop polling: Party is over.
>
>     Sponsored by:           Netflix
>

Reviewed by: jtl@
Differential Revision: https://reviews.freebsd.org/D41877


> ---
>  sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
> index b256c4713c8d..4e37aa0e1020 100644
> --- a/sys/dev/nvme/nvme_qpair.c
> +++ b/sys/dev/nvme/nvme_qpair.c
> @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)
>
>         mtx_assert(&qpair->recovery, MA_OWNED);
>
> +       /*
> +        * If the controller has failed, give up. We're never going to
> change
> +        * state from a failed controller: no further transactions are
> possible.
> +        * We go ahead and let the timeout expire in many cases for
> simplicity.
> +        */
> +       if (qpair->ctrlr->is_failed) {
> +               nvme_printf(ctrlr, "Controller failed, giving up\n");
> +               qpair->timer_armed =3D false;
> +               return;
> +       }
> +
>         switch (qpair->recovery_state) {
>         case RECOVERY_NONE:
>                 /*
> @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)
>                 idle =3D false;                   /* We want to keep poll=
ing
> */
>                 break;
>         case RECOVERY_WAITING:
> -               nvme_printf(ctrlr, "waiting for reset to complete\n");
> -               idle =3D false;                   /* We want to keep poll=
ing
> */
> +               nvme_printf(ctrlr, "Waiting for reset to complete\n");
> +               idle =3D false;           /* We want to keep polling */
>                 break;
>         }
>
>

--0000000000001030d506058f954c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sun, Sep 17, 2023 at 4:16=E2=80=AF=
PM Warner Losh &lt;<a href=3D"mailto:imp@freebsd.org">imp@freebsd.org</a>&g=
t; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p=
x 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The br=
anch main has been updated by imp:<br>
<br>
URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3Dd95431624f934fe47=
40211738fc787808005b14e" rel=3D"noreferrer" target=3D"_blank">https://cgit.=
FreeBSD.org/src/commit/?id=3Dd95431624f934fe4740211738fc787808005b14e</a><b=
r>
<br>
commit d95431624f934fe4740211738fc787808005b14e<br>
Author:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt;<br>
AuthorDate: 2023-09-15 16:02:32 +0000<br>
Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt;<br>
CommitDate: 2023-09-17 15:11:56 +0000<br>
<br>
=C2=A0 =C2=A0 nvme: Give up when we&#39;ve failed<br>
<br>
=C2=A0 =C2=A0 Normally, we poll the device every so often to see if command=
s have<br>
=C2=A0 =C2=A0 timed out. However, we&#39;ll go into the recovery state as p=
art of failing<br>
=C2=A0 =C2=A0 the drive. To account for all possibilties, if we&#39;re fail=
ed when we get<br>
=C2=A0 =C2=A0 into the polling function, just stop polling: Party is over.<=
br>
<br>
=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Netflix=
<br></blockquote><div><br></div><div>Reviewed by: jtl@</div><div>Differenti=
al Revision:=C2=A0<a href=3D"https://reviews.freebsd.org/D41877">https://re=
views.freebsd.org/D41877</a></div><div>=C2=A0<br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2=
04,204,204);padding-left:1ex">
---<br>
=C2=A0sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--<br>
=C2=A01 file changed, 13 insertions(+), 2 deletions(-)<br>
<br>
diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c<br>
index b256c4713c8d..4e37aa0e1020 100644<br>
--- a/sys/dev/nvme/nvme_qpair.c<br>
+++ b/sys/dev/nvme/nvme_qpair.c<br>
@@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_assert(&amp;qpair-&gt;recovery, MA_OWNED);<=
br>
<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * If the controller has failed, give up. We&#3=
9;re never going to change<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * state from a failed controller: no further t=
ransactions are possible.<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * We go ahead and let the timeout expire in ma=
ny cases for simplicity.<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (qpair-&gt;ctrlr-&gt;is_failed) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, =
&quot;Controller failed, giving up\n&quot;);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qpair-&gt;timer_arm=
ed =3D false;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 switch (qpair-&gt;recovery_state) {<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 case RECOVERY_NONE:<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*<br>
@@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 idle =3D false;=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We wan=
t to keep polling */<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 case RECOVERY_WAITING:<br>
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, =
&quot;waiting for reset to complete\n&quot;);<br>
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0idle =3D false;=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We wan=
t to keep polling */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, =
&quot;Waiting for reset to complete\n&quot;);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0idle =3D false;=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We want to keep polling */<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>
<br>
</blockquote></div></div>

--0000000000001030d506058f954c--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ>