Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Sep 2023 16:21:28 +0100
From:      Warner Losh <imp@bsdimp.com>
To:        Warner Losh <imp@freebsd.org>
Cc:        src-committers@freebsd.org, dev-commits-src-all@freebsd.org,  dev-commits-src-main@freebsd.org
Subject:   Re: git: d95431624f93 - main - nvme: Give up when we've failed
Message-ID:  <CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ@mail.gmail.com>
In-Reply-To: <202309171516.38HFGT2g089933@gitrepo.freebsd.org>
References:  <202309171516.38HFGT2g089933@gitrepo.freebsd.org>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Sun, Sep 17, 2023 at 4:16 PM Warner Losh <imp@freebsd.org> wrote:

> The branch main has been updated by imp:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e
>
> commit d95431624f934fe4740211738fc787808005b14e
> Author:     Warner Losh <imp@FreeBSD.org>
> AuthorDate: 2023-09-15 16:02:32 +0000
> Commit:     Warner Losh <imp@FreeBSD.org>
> CommitDate: 2023-09-17 15:11:56 +0000
>
>     nvme: Give up when we've failed
>
>     Normally, we poll the device every so often to see if commands have
>     timed out. However, we'll go into the recovery state as part of failing
>     the drive. To account for all possibilties, if we're failed when we get
>     into the polling function, just stop polling: Party is over.
>
>     Sponsored by:           Netflix
>

Reviewed by: jtl@
Differential Revision: https://reviews.freebsd.org/D41877


> ---
>  sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
> index b256c4713c8d..4e37aa0e1020 100644
> --- a/sys/dev/nvme/nvme_qpair.c
> +++ b/sys/dev/nvme/nvme_qpair.c
> @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)
>
>         mtx_assert(&qpair->recovery, MA_OWNED);
>
> +       /*
> +        * If the controller has failed, give up. We're never going to
> change
> +        * state from a failed controller: no further transactions are
> possible.
> +        * We go ahead and let the timeout expire in many cases for
> simplicity.
> +        */
> +       if (qpair->ctrlr->is_failed) {
> +               nvme_printf(ctrlr, "Controller failed, giving up\n");
> +               qpair->timer_armed = false;
> +               return;
> +       }
> +
>         switch (qpair->recovery_state) {
>         case RECOVERY_NONE:
>                 /*
> @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)
>                 idle = false;                   /* We want to keep polling
> */
>                 break;
>         case RECOVERY_WAITING:
> -               nvme_printf(ctrlr, "waiting for reset to complete\n");
> -               idle = false;                   /* We want to keep polling
> */
> +               nvme_printf(ctrlr, "Waiting for reset to complete\n");
> +               idle = false;           /* We want to keep polling */
>                 break;
>         }
>
>

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 17, 2023 at 4:16 PM Warner Losh &lt;<a href="mailto:imp@freebsd.org">imp@freebsd.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The branch main has been updated by imp:<br>
<br>
URL: <a href="https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e" rel="noreferrer" target="_blank">https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e</a><br>;
<br>
commit d95431624f934fe4740211738fc787808005b14e<br>
Author:     Warner Losh &lt;imp@FreeBSD.org&gt;<br>
AuthorDate: 2023-09-15 16:02:32 +0000<br>
Commit:     Warner Losh &lt;imp@FreeBSD.org&gt;<br>
CommitDate: 2023-09-17 15:11:56 +0000<br>
<br>
    nvme: Give up when we&#39;ve failed<br>
<br>
    Normally, we poll the device every so often to see if commands have<br>
    timed out. However, we&#39;ll go into the recovery state as part of failing<br>
    the drive. To account for all possibilties, if we&#39;re failed when we get<br>
    into the polling function, just stop polling: Party is over.<br>
<br>
    Sponsored by:           Netflix<br></blockquote><div><br></div><div>Reviewed by: jtl@</div><div>Differential Revision: <a href="https://reviews.freebsd.org/D41877">https://reviews.freebsd.org/D41877</a></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
---<br>
 sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--<br>
 1 file changed, 13 insertions(+), 2 deletions(-)<br>
<br>
diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c<br>
index b256c4713c8d..4e37aa0e1020 100644<br>
--- a/sys/dev/nvme/nvme_qpair.c<br>
+++ b/sys/dev/nvme/nvme_qpair.c<br>
@@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)<br>
<br>
        mtx_assert(&amp;qpair-&gt;recovery, MA_OWNED);<br>
<br>
+       /*<br>
+        * If the controller has failed, give up. We&#39;re never going to change<br>
+        * state from a failed controller: no further transactions are possible.<br>
+        * We go ahead and let the timeout expire in many cases for simplicity.<br>
+        */<br>
+       if (qpair-&gt;ctrlr-&gt;is_failed) {<br>
+               nvme_printf(ctrlr, &quot;Controller failed, giving up\n&quot;);<br>
+               qpair-&gt;timer_armed = false;<br>
+               return;<br>
+       }<br>
+<br>
        switch (qpair-&gt;recovery_state) {<br>
        case RECOVERY_NONE:<br>
                /*<br>
@@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)<br>
                idle = false;                   /* We want to keep polling */<br>
                break;<br>
        case RECOVERY_WAITING:<br>
-               nvme_printf(ctrlr, &quot;waiting for reset to complete\n&quot;);<br>
-               idle = false;                   /* We want to keep polling */<br>
+               nvme_printf(ctrlr, &quot;Waiting for reset to complete\n&quot;);<br>
+               idle = false;           /* We want to keep polling */<br>
                break;<br>
        }<br>
<br>
</blockquote></div></div>
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ>