Date: Sun, 17 Sep 2023 16:21:28 +0100 From: Warner Losh <imp@bsdimp.com> To: Warner Losh <imp@freebsd.org> Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: d95431624f93 - main - nvme: Give up when we've failed Message-ID: <CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ@mail.gmail.com> In-Reply-To: <202309171516.38HFGT2g089933@gitrepo.freebsd.org> References: <202309171516.38HFGT2g089933@gitrepo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000001030d506058f954c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Sep 17, 2023 at 4:16=E2=80=AFPM Warner Losh <imp@freebsd.org> wrote= : > The branch main has been updated by imp: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=3Dd95431624f934fe4740211738fc7878= 08005b14e > > commit d95431624f934fe4740211738fc787808005b14e > Author: Warner Losh <imp@FreeBSD.org> > AuthorDate: 2023-09-15 16:02:32 +0000 > Commit: Warner Losh <imp@FreeBSD.org> > CommitDate: 2023-09-17 15:11:56 +0000 > > nvme: Give up when we've failed > > Normally, we poll the device every so often to see if commands have > timed out. However, we'll go into the recovery state as part of faili= ng > the drive. To account for all possibilties, if we're failed when we g= et > into the polling function, just stop polling: Party is over. > > Sponsored by: Netflix > Reviewed by: jtl@ Differential Revision: https://reviews.freebsd.org/D41877 > --- > sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c > index b256c4713c8d..4e37aa0e1020 100644 > --- a/sys/dev/nvme/nvme_qpair.c > +++ b/sys/dev/nvme/nvme_qpair.c > @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg) > > mtx_assert(&qpair->recovery, MA_OWNED); > > + /* > + * If the controller has failed, give up. We're never going to > change > + * state from a failed controller: no further transactions are > possible. > + * We go ahead and let the timeout expire in many cases for > simplicity. > + */ > + if (qpair->ctrlr->is_failed) { > + nvme_printf(ctrlr, "Controller failed, giving up\n"); > + qpair->timer_armed =3D false; > + return; > + } > + > switch (qpair->recovery_state) { > case RECOVERY_NONE: > /* > @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg) > idle =3D false; /* We want to keep poll= ing > */ > break; > case RECOVERY_WAITING: > - nvme_printf(ctrlr, "waiting for reset to complete\n"); > - idle =3D false; /* We want to keep poll= ing > */ > + nvme_printf(ctrlr, "Waiting for reset to complete\n"); > + idle =3D false; /* We want to keep polling */ > break; > } > > --0000000000001030d506058f954c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Sun, Sep 17, 2023 at 4:16=E2=80=AF= PM Warner Losh <<a href=3D"mailto:imp@freebsd.org">imp@freebsd.org</a>&g= t; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p= x 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The br= anch main has been updated by imp:<br> <br> URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3Dd95431624f934fe47= 40211738fc787808005b14e" rel=3D"noreferrer" target=3D"_blank">https://cgit.= FreeBSD.org/src/commit/?id=3Dd95431624f934fe4740211738fc787808005b14e</a><b= r> <br> commit d95431624f934fe4740211738fc787808005b14e<br> Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org><br> AuthorDate: 2023-09-15 16:02:32 +0000<br> Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org><br> CommitDate: 2023-09-17 15:11:56 +0000<br> <br> =C2=A0 =C2=A0 nvme: Give up when we've failed<br> <br> =C2=A0 =C2=A0 Normally, we poll the device every so often to see if command= s have<br> =C2=A0 =C2=A0 timed out. However, we'll go into the recovery state as p= art of failing<br> =C2=A0 =C2=A0 the drive. To account for all possibilties, if we're fail= ed when we get<br> =C2=A0 =C2=A0 into the polling function, just stop polling: Party is over.<= br> <br> =C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Netflix= <br></blockquote><div><br></div><div>Reviewed by: jtl@</div><div>Differenti= al Revision:=C2=A0<a href=3D"https://reviews.freebsd.org/D41877">https://re= views.freebsd.org/D41877</a></div><div>=C2=A0<br></div><blockquote class=3D= "gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2= 04,204,204);padding-left:1ex"> ---<br> =C2=A0sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--<br> =C2=A01 file changed, 13 insertions(+), 2 deletions(-)<br> <br> diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c<br> index b256c4713c8d..4e37aa0e1020 100644<br> --- a/sys/dev/nvme/nvme_qpair.c<br> +++ b/sys/dev/nvme/nvme_qpair.c<br> @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_assert(&qpair->recovery, MA_OWNED);<= br> <br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 * If the controller has failed, give up. We= 9;re never going to change<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 * state from a failed controller: no further t= ransactions are possible.<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 * We go ahead and let the timeout expire in ma= ny cases for simplicity.<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0if (qpair->ctrlr->is_failed) {<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, = "Controller failed, giving up\n");<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qpair->timer_arm= ed =3D false;<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return;<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br> +<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 switch (qpair->recovery_state) {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 case RECOVERY_NONE:<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*<br> @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 idle =3D false;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We wan= t to keep polling */<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 case RECOVERY_WAITING:<br> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, = "waiting for reset to complete\n");<br> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0idle =3D false;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We wan= t to keep polling */<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, = "Waiting for reset to complete\n");<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0idle =3D false;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We want to keep polling */<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br> <br> </blockquote></div></div> --0000000000001030d506058f954c--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrAfFF3QEXDj6gkih0o2t1y%2Bt5Tc-V%2B3XzvO9ZrYFepAQ>