Date: Tue, 14 Jan 2025 21:37:08 -0700 From: Warner Losh <imp@bsdimp.com> To: Zhenlei Huang <zlei@freebsd.org> Cc: Warner Losh <imp@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org>, Edward Tomasz Napierala <trasz@freebsd.org> Subject: Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver Message-ID: <CANCZdfqCy=TdPRGrTKj=7DJz9nSyPFZgBnsPuf4FqgurN4oYiw@mail.gmail.com> In-Reply-To: <C12BB5CB-ED16-4856-ADE1-6D788FB9799E@FreeBSD.org> References: <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com> <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> <E39D3960-42CE-4290-AF31-EA385D8512AF@FreeBSD.org> <C12BB5CB-ED16-4856-ADE1-6D788FB9799E@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000a28822062bb73b88 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Great. This looks good. Warner On Tue, Jan 14, 2025, 9:36=E2=80=AFPM Zhenlei Huang <zlei@freebsd.org> wrot= e: > > > On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote: > > > > On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote: > > > > On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com> wrote: > > > > On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang <zlei@freebsd.org> w= rote: > >> Hi Warner, >> >> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird >> report for sas speed. >> The boot console has the following, >> >> ``` >> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0 >> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device >> da0: 4294967.295MB/s transfers >> ``` >> But camcontrol report the correct value, >> ``` >> # camcontrol inquiry da0 -R >> pass1: 750.000MB/s transfers, Command Queueing Enabled >> ``` >> >> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any >> logic set those values. >> >> Finally I managed to get the stack trace, >> ``` >> _scsi_announce_periph >> scsi_announce_periph_sbuf >> xpt_announce_periph_sbuf >> dadone_proberc >> xpt_done_process >> xpt_done_td >> fork_exit >> fork_trampoline >> ``` >> >> and noticed that the last param `cts` of `_scsi_announce_periph(struct >> cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings >> *cts)` >> is from kernel stack and is not properly initialized, latter I found som= e >> commits related to this, >> >> 076686fe0703 cam: make sure to clear CCBs allocated on the stack >> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the sta= ck >> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. >> 616a676a0535 cam: clear stack-allocated CCB in the target layer >> >> I applied them to stable/13, rebuild and reboot, now the speed of da0 is >> reported correctly. I also tried to patch the pvscsi driver with few lin= es >> and >> it also works as intended. >> >> ``` >> --- a/sys/dev/vmware/pvscsi/pvscsi.c >> +++ b/sys/dev/vmware/pvscsi/pvscsi.c >> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb= ) >> cts->proto_specific.scsi.flags =3D CTS_SCSI_FLAGS_TAG_EN= B; >> cts->proto_specific.scsi.valid =3D CTS_SCSI_VALID_TQ; >> >> + /* Prefer connection speed over sas port speed */ >> + /* cts->xport_specific.sas.bitrate =3D 0; */ >> + cts->xport_specific.sas.valid =3D 0; >> + >> ccb_h->status =3D CAM_REQ_CMP; >> xpt_done(ccb); >> ``` >> >> Things come clear and I know why this weird speed happens, now it is tim= e >> to decide how to fix it. >> >> Fixing the consumer of cam, aka pvscsi driver, is quite simple and >> promising. I did a quick search it appears other consumers set >> `cts->xport_specific.sas.valid` correctly. It does not convince me as I'= m >> quite new to cam subsystem. >> > > Yes. sas.valid is set when the sas.bitrate and other data has been set > correctly. Setting it to 0 like this ensures it's ignored. So if you kno= w > the speed, set sas.bitrate to that speed and sas.valid to 1. > > > That is clear. > > > I'm not sure I answered the question right, but if not, please clarify or > point out what I missed and I'll try again. > > >> Which one do you prefer, MFC commits to stable/13, or do direct fix for >> pvscsi driver to stable/13 ? >> > > [[ Sorry for the delay, I missed this all month ]] > > I generally prefer a MFC, unless the code is no longer in -current. > > > The code live in -current and all supported stable branches. > > Even if there's two different fixes for this logical bug, fixing it in > current, then MFCing that (with the current hash) is fine, even if the > stable/13 changes are completely different. > > > The bug does not exist in current and stable/14 but only in stable/13, du= e > to Edward's work ( commits those zero stack-allocated CCBs ) were not MFC= ed > into stable/13 branch. > > For stable/13 I guess it matters a bit less than stable/14 since I'll be > merging to it less, but if it's a commit from -current that doesn't need = to > be made to -stable because of the new commit on stable, I tend to include > the MFC hash text. > > > Do you mean the `cherry picked from commit` commit message ? > > > Warner > > > I'm preparing and testing the MFCing. Bless me to not make things messed > up :) > > > And the individual fix for pvscsi is posted to > https://reviews.freebsd.org/D48438 . > > > Landed in -current. Will be MFCed to stable/13 after a few days. > > > > Best regards, > Zhenlei > > > > > --000000000000a28822062bb73b88 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto">Great. This looks good.<div dir=3D"auto"><br></div><div d= ir=3D"auto">Warner</div></div><br><div class=3D"gmail_quote gmail_quote_con= tainer"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jan 14, 2025, 9:36=E2= =80=AFPM Zhenlei Huang <<a href=3D"mailto:zlei@freebsd.org">zlei@freebsd= .org</a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar= gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"w= ord-wrap:break-word;line-break:after-white-space"><br><div><br><blockquote = type=3D"cite"><div>On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <<a href= =3D"mailto:zlei@FreeBSD.org" target=3D"_blank" rel=3D"noreferrer">zlei@Free= BSD.org</a>> wrote:</div><br><div><br><br style=3D"font-family:Helvetica= ;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;= letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;= white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type= =3D"cite" style=3D"font-family:Helvetica;font-size:13px;font-style:normal;f= ont-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:st= art;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px= ;text-decoration:none"><div>On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <= <a href=3D"mailto:zlei@FreeBSD.org" target=3D"_blank" rel=3D"noreferrer">zl= ei@FreeBSD.org</a>> wrote:</div><br><div><div style=3D"font-family:Helve= tica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:= 400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:n= one;white-space:normal;word-spacing:0px;text-decoration:none"><br><br><bloc= kquote type=3D"cite"><div>On Dec 28, 2024, at 6:03 AM, Warner Losh <<a h= ref=3D"mailto:imp@bsdimp.com" target=3D"_blank" rel=3D"noreferrer">imp@bsdi= mp.com</a>> wrote:</div><br><div><br><br style=3D"font-family:Helvetica;= font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;l= etter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;w= hite-space:normal;word-spacing:0px;text-decoration:none"><div class=3D"gmai= l_quote" style=3D"font-family:Helvetica;font-size:13px;font-style:normal;fo= nt-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:sta= rt;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;= text-decoration:none"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Dec 2, = 2024 at 2:41=E2=80=AFAM Zhenlei Huang <<a href=3D"mailto:zlei@freebsd.or= g" target=3D"_blank" rel=3D"noreferrer">zlei@freebsd.org</a>> wrote:<br>= </div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b= order-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,= 204);padding-left:1ex">Hi Warner,<br><br>Recently I upgraded some ESXi vms = from 13.3 to 13.4 and noticed weird report for sas speed.<br>The boot conso= le has the following,<br><br>```<br>da0 at pvscsi0 bus 0 scbus2 target 0 lu= n 0<br>da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI = device<br>da0: 4294967.295MB/s transfers<br>```<br>But camcontrol report th= e correct value,<br>```<br># camcontrol inquiry da0 -R<br>pass1: 750.000MB/= s transfers, Command Queueing Enabled<br>```<br><br>The `4294967.295MB` is = actually 0xffff_ffff or -1 but I do not see any logic set those values.<br>= <br>Finally I managed to get the stack trace,<br>```<br>_scsi_announce_peri= ph<br>scsi_announce_periph_sbuf<br>xpt_announce_periph_sbuf<br>dadone_probe= rc<br>xpt_done_process<br>xpt_done_td<br>fork_exit<br>fork_trampoline<br>``= `<br><br>and noticed that the last param `cts` of `_scsi_announce_periph(st= ruct cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settin= gs *cts)`<br>is from kernel stack and is not properly initialized, latter I= found some commits related to this,<br><br>076686fe0703 cam: make sure to = clear CCBs allocated on the stack<br>ec5325dbca62 cam: make sure to clear e= ven more CCBs allocated on the stack<br>0f206cc91279 cam: add missing zeroi= ng of a stack-allocated CCB.<br>616a676a0535 cam: clear stack-allocated CCB= in the target layer<br><br>I applied them to stable/13, rebuild and reboot= , now the speed of da0 is reported correctly. I also tried to patch the pvs= csi driver with few lines and<br>it also works as intended.<br><br>```<br>-= -- a/sys/dev/vmware/pvscsi/pvscsi.c<br>+++ b/sys/dev/vmware/pvscsi/pvscsi.c= <br>@@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *cc= b)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0</= span>cts->proto_specific.scsi.flags =3D CTS_SCSI_FLAGS_TAG_ENB;<br>=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0</span>cts-= >proto_specific.scsi.valid =3D CTS_SCSI_VALID_TQ;<br><br>+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* Prefer connection speed over sa= s port speed */<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= /* cts->xport_specific.sas.bitrate =3D 0; */<br>+=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cts->xport_specific.sas.valid =3D 0;<b= r>+<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0<= /span>ccb_h->status =3D CAM_REQ_CMP;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0</span>xpt_done(ccb);<br>```<br><br>Thi= ngs come clear and I know why this weird speed happens, now it is time to d= ecide how to fix it.<br><br>Fixing the consumer of cam, aka pvscsi driver, = is quite simple and promising. I did a quick search it appears other consum= ers set `cts->xport_specific.sas.valid` correctly. It does not convince = me as I'm quite new to cam subsystem.<br></blockquote><div><br></div><d= iv>Yes. sas.valid is set when the sas.bitrate and other data has been set c= orrectly. Setting it to 0 like this ensures it's ignored.=C2=A0 So if y= ou know the speed, set sas.bitrate to that speed and sas.valid to 1.</div><= /div></div></blockquote><div><br></div><div>That is clear.</div><br><blockq= uote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font-family:Hel= vetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weigh= t:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform= :none;white-space:normal;word-spacing:0px;text-decoration:none"><div><br></= div><div>I'm not sure I answered the question right, but if not, please= clarify or point out what I missed and I'll try again.</div><div>=C2= =A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e= x;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,2= 04,204);padding-left:1ex">Which one do you prefer, MFC commits to stable/13= , or do direct fix for pvscsi driver to stable/13 ?<br></blockquote><div><b= r></div><div>[[ Sorry for the delay, I missed this all month ]]</div><div><= br></div><div>I generally prefer a MFC, unless the code is no longer in -cu= rrent.<span>=C2=A0</span></div></div></div></blockquote><div><br></div><div= >The code live in -current and all supported stable branches.</div><br><blo= ckquote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font-family:= Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-we= ight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transf= orm:none;white-space:normal;word-spacing:0px;text-decoration:none"><div>Eve= n if there's two different fixes for this logical bug, fixing it in cur= rent, then MFCing that (with the current hash) is fine, even if the stable/= 13 changes are completely different.<span>=C2=A0</span></div></div></div></= blockquote><div><br></div><div>The bug does not exist in current and stable= /14 but only in stable/13, due to Edward's work ( commits those zero st= ack-allocated CCBs ) were not MFCed into stable/13 branch.</div><div><br></= div><blockquote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font= -family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal= ;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;tex= t-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">= <div>For stable/13 I guess it matters a bit less than stable/14 since I'= ;ll be merging to it less, but if it's a commit from -current that does= n't need to be made to -stable because of the new commit on stable, I t= end to include the MFC hash text.</div></div></div></blockquote><div><br></= div><div>Do you mean the `cherry picked from commit` commit message ?</div>= <br><blockquote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font= -family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal= ;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;tex= t-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">= <div><br></div><div>Warner</div></div></div></blockquote></div><div style= =3D"font-family:Helvetica;font-size:13px;font-style:normal;font-variant-cap= s:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent= :0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoratio= n:none"><br></div><div style=3D"font-family:Helvetica;font-size:13px;font-s= tyle:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;= text-align:start;text-indent:0px;text-transform:none;white-space:normal;wor= d-spacing:0px;text-decoration:none">I'm preparing and testing the MFCin= g. Bless me to not make things messed up :)</div></div></blockquote><div st= yle=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-variant-= caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-ind= ent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decora= tion:none"><br></div><div style=3D"font-family:Helvetica;font-size:13px;fon= t-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:norm= al;text-align:start;text-indent:0px;text-transform:none;white-space:normal;= word-spacing:0px;text-decoration:none">And the individual fix for pvscsi is= posted to=C2=A0<a href=3D"https://reviews.freebsd.org/D48438" target=3D"_b= lank" rel=3D"noreferrer">https://reviews.freebsd.org/D48438</a>=C2=A0.</div= ></div></blockquote><div><br></div><div>Landed in -current. Will be MFCed t= o stable/13 after a few days.</div><br><blockquote type=3D"cite"><div><br s= tyle=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-variant= -caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-in= dent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decor= ation:none"><blockquote type=3D"cite" style=3D"font-family:Helvetica;font-s= ize:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-= spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-s= pace:normal;word-spacing:0px;text-decoration:none"><div><br style=3D"font-f= amily:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;f= ont-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-= transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><d= iv style=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-var= iant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;tex= t-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-d= ecoration:none"><div>Best regards,</div><div>Zhenlei</div></div></div></blo= ckquote></div></blockquote></div><br><div> <div><br></div> </div> <br></div></blockquote></div> --000000000000a28822062bb73b88--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqCy=TdPRGrTKj=7DJz9nSyPFZgBnsPuf4FqgurN4oYiw>