Date: Sun, 17 Sep 2023 16:21:28 +0100 From: Warner Losh <imp@bsdimp.com> To: Warner Losh <imp@freebsd.org> Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: d95431624f93 - main - nvme: Give up when we've failed Message-ID: <CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ@mail.gmail.com> In-Reply-To: <202309171516.38HFGT2g089933@gitrepo.freebsd.org> References: <202309171516.38HFGT2g089933@gitrepo.freebsd.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Sun, Sep 17, 2023 at 4:16 PM Warner Losh <imp@freebsd.org> wrote: > The branch main has been updated by imp: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e > > commit d95431624f934fe4740211738fc787808005b14e > Author: Warner Losh <imp@FreeBSD.org> > AuthorDate: 2023-09-15 16:02:32 +0000 > Commit: Warner Losh <imp@FreeBSD.org> > CommitDate: 2023-09-17 15:11:56 +0000 > > nvme: Give up when we've failed > > Normally, we poll the device every so often to see if commands have > timed out. However, we'll go into the recovery state as part of failing > the drive. To account for all possibilties, if we're failed when we get > into the polling function, just stop polling: Party is over. > > Sponsored by: Netflix > Reviewed by: jtl@ Differential Revision: https://reviews.freebsd.org/D41877 > --- > sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c > index b256c4713c8d..4e37aa0e1020 100644 > --- a/sys/dev/nvme/nvme_qpair.c > +++ b/sys/dev/nvme/nvme_qpair.c > @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg) > > mtx_assert(&qpair->recovery, MA_OWNED); > > + /* > + * If the controller has failed, give up. We're never going to > change > + * state from a failed controller: no further transactions are > possible. > + * We go ahead and let the timeout expire in many cases for > simplicity. > + */ > + if (qpair->ctrlr->is_failed) { > + nvme_printf(ctrlr, "Controller failed, giving up\n"); > + qpair->timer_armed = false; > + return; > + } > + > switch (qpair->recovery_state) { > case RECOVERY_NONE: > /* > @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg) > idle = false; /* We want to keep polling > */ > break; > case RECOVERY_WAITING: > - nvme_printf(ctrlr, "waiting for reset to complete\n"); > - idle = false; /* We want to keep polling > */ > + nvme_printf(ctrlr, "Waiting for reset to complete\n"); > + idle = false; /* We want to keep polling */ > break; > } > > [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 17, 2023 at 4:16 PM Warner Losh <<a href="mailto:imp@freebsd.org">imp@freebsd.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The branch main has been updated by imp:<br> <br> URL: <a href="https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e" rel="noreferrer" target="_blank">https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e</a><br> <br> commit d95431624f934fe4740211738fc787808005b14e<br> Author: Warner Losh <imp@FreeBSD.org><br> AuthorDate: 2023-09-15 16:02:32 +0000<br> Commit: Warner Losh <imp@FreeBSD.org><br> CommitDate: 2023-09-17 15:11:56 +0000<br> <br> nvme: Give up when we've failed<br> <br> Normally, we poll the device every so often to see if commands have<br> timed out. However, we'll go into the recovery state as part of failing<br> the drive. To account for all possibilties, if we're failed when we get<br> into the polling function, just stop polling: Party is over.<br> <br> Sponsored by: Netflix<br></blockquote><div><br></div><div>Reviewed by: jtl@</div><div>Differential Revision: <a href="https://reviews.freebsd.org/D41877">https://reviews.freebsd.org/D41877</a></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> ---<br> sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--<br> 1 file changed, 13 insertions(+), 2 deletions(-)<br> <br> diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c<br> index b256c4713c8d..4e37aa0e1020 100644<br> --- a/sys/dev/nvme/nvme_qpair.c<br> +++ b/sys/dev/nvme/nvme_qpair.c<br> @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)<br> <br> mtx_assert(&qpair->recovery, MA_OWNED);<br> <br> + /*<br> + * If the controller has failed, give up. We're never going to change<br> + * state from a failed controller: no further transactions are possible.<br> + * We go ahead and let the timeout expire in many cases for simplicity.<br> + */<br> + if (qpair->ctrlr->is_failed) {<br> + nvme_printf(ctrlr, "Controller failed, giving up\n");<br> + qpair->timer_armed = false;<br> + return;<br> + }<br> +<br> switch (qpair->recovery_state) {<br> case RECOVERY_NONE:<br> /*<br> @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)<br> idle = false; /* We want to keep polling */<br> break;<br> case RECOVERY_WAITING:<br> - nvme_printf(ctrlr, "waiting for reset to complete\n");<br> - idle = false; /* We want to keep polling */<br> + nvme_printf(ctrlr, "Waiting for reset to complete\n");<br> + idle = false; /* We want to keep polling */<br> break;<br> }<br> <br> </blockquote></div></div>help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ>
