Date: Wed, 15 Jan 2025 12:35:57 +0800 From: Zhenlei Huang <zlei@FreeBSD.org> To: Warner Losh <imp@bsdimp.com> Cc: Warner Losh <imp@freebsd.org>, stable@freebsd.org, Edward Tomasz Napierala <trasz@freebsd.org> Subject: Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver Message-ID: <C12BB5CB-ED16-4856-ADE1-6D788FB9799E@FreeBSD.org> In-Reply-To: <E39D3960-42CE-4290-AF31-EA385D8512AF@FreeBSD.org> References: <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com> <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> <E39D3960-42CE-4290-AF31-EA385D8512AF@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_90451C18-3A2B-4A28-B5FE-5043B6553940 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote: >=20 >=20 >=20 >> On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <zlei@FreeBSD.org = <mailto:zlei@FreeBSD.org>> wrote: >>=20 >>=20 >>=20 >>> On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com = <mailto:imp@bsdimp.com>> wrote: >>>=20 >>>=20 >>>=20 >>> On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang = <zlei@freebsd.org <mailto:zlei@freebsd.org>> wrote: >>> Hi Warner, >>>=20 >>> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed = weird report for sas speed. >>> The boot console has the following, >>>=20 >>> ``` >>> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0 >>> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device >>> da0: 4294967.295MB/s transfers >>> ``` >>> But camcontrol report the correct value, >>> ``` >>> # camcontrol inquiry da0 -R >>> pass1: 750.000MB/s transfers, Command Queueing Enabled >>> ``` >>>=20 >>> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see = any logic set those values. >>>=20 >>> Finally I managed to get the stack trace, >>> ``` >>> _scsi_announce_periph >>> scsi_announce_periph_sbuf >>> xpt_announce_periph_sbuf >>> dadone_proberc >>> xpt_done_process >>> xpt_done_td >>> fork_exit >>> fork_trampoline >>> ``` >>>=20 >>> and noticed that the last param `cts` of = `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int = *freq, struct ccb_trans_settings *cts)` >>> is from kernel stack and is not properly initialized, latter I found = some commits related to this, >>>=20 >>> 076686fe0703 cam: make sure to clear CCBs allocated on the stack >>> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the = stack >>> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. >>> 616a676a0535 cam: clear stack-allocated CCB in the target layer >>>=20 >>> I applied them to stable/13, rebuild and reboot, now the speed of = da0 is reported correctly. I also tried to patch the pvscsi driver with = few lines and >>> it also works as intended. >>>=20 >>> ``` >>> --- a/sys/dev/vmware/pvscsi/pvscsi.c >>> +++ b/sys/dev/vmware/pvscsi/pvscsi.c >>> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb = *ccb) >>> cts->proto_specific.scsi.flags =3D = CTS_SCSI_FLAGS_TAG_ENB; >>> cts->proto_specific.scsi.valid =3D = CTS_SCSI_VALID_TQ; >>>=20 >>> + /* Prefer connection speed over sas port speed */ >>> + /* cts->xport_specific.sas.bitrate =3D 0; */ >>> + cts->xport_specific.sas.valid =3D 0; >>> + >>> ccb_h->status =3D CAM_REQ_CMP; >>> xpt_done(ccb); >>> ``` >>>=20 >>> Things come clear and I know why this weird speed happens, now it is = time to decide how to fix it. >>>=20 >>> Fixing the consumer of cam, aka pvscsi driver, is quite simple and = promising. I did a quick search it appears other consumers set = `cts->xport_specific.sas.valid` correctly. It does not convince me as = I'm quite new to cam subsystem. >>>=20 >>> Yes. sas.valid is set when the sas.bitrate and other data has been = set correctly. Setting it to 0 like this ensures it's ignored. So if = you know the speed, set sas.bitrate to that speed and sas.valid to 1. >>=20 >> That is clear. >>=20 >>>=20 >>> I'm not sure I answered the question right, but if not, please = clarify or point out what I missed and I'll try again. >>> =20 >>> Which one do you prefer, MFC commits to stable/13, or do direct fix = for pvscsi driver to stable/13 ? >>>=20 >>> [[ Sorry for the delay, I missed this all month ]] >>>=20 >>> I generally prefer a MFC, unless the code is no longer in -current.=20= >>=20 >> The code live in -current and all supported stable branches. >>=20 >>> Even if there's two different fixes for this logical bug, fixing it = in current, then MFCing that (with the current hash) is fine, even if = the stable/13 changes are completely different.=20 >>=20 >> The bug does not exist in current and stable/14 but only in = stable/13, due to Edward's work ( commits those zero stack-allocated = CCBs ) were not MFCed into stable/13 branch. >>=20 >>> For stable/13 I guess it matters a bit less than stable/14 since = I'll be merging to it less, but if it's a commit from -current that = doesn't need to be made to -stable because of the new commit on stable, = I tend to include the MFC hash text. >>=20 >> Do you mean the `cherry picked from commit` commit message ? >>=20 >>>=20 >>> Warner >>=20 >>=20 >> I'm preparing and testing the MFCing. Bless me to not make things = messed up :) >=20 > And the individual fix for pvscsi is posted to = https://reviews.freebsd.org/D48438 <https://reviews.freebsd.org/D48438> = . Landed in -current. Will be MFCed to stable/13 after a few days. >=20 >>=20 >> Best regards, >> Zhenlei --Apple-Mail=_90451C18-3A2B-4A28-B5FE-5043B6553940 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br = class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D"">On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <<a = href=3D"mailto:zlei@FreeBSD.org" class=3D"">zlei@FreeBSD.org</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><meta = charset=3D"UTF-8" class=3D""><br class=3D"Apple-interchange-newline"><br = class=3D"" style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; = font-size: 13px; font-style: normal; font-variant-caps: normal; = font-weight: 400; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: = none;"><blockquote type=3D"cite" class=3D"" style=3D"font-family: = Helvetica; font-size: 13px; font-style: normal; font-variant-caps: = normal; font-weight: 400; letter-spacing: normal; orphans: auto; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; = -webkit-text-stroke-width: 0px; text-decoration: none;"><div class=3D"">On= Jan 13, 2025, at 4:22 PM, Zhenlei Huang <<a = href=3D"mailto:zlei@FreeBSD.org" class=3D"">zlei@FreeBSD.org</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = class=3D"" style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; = font-size: 13px; font-style: normal; font-variant-caps: normal; = font-weight: 400; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: = none;"><br class=3D"Apple-interchange-newline"><br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D"">On Dec 28, 2024, at 6:03 AM, = Warner Losh <<a href=3D"mailto:imp@bsdimp.com" = class=3D"">imp@bsdimp.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><br = class=3D"Apple-interchange-newline"><br class=3D"" style=3D"caret-color: = rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: = normal; font-variant-caps: normal; font-weight: 400; letter-spacing: = normal; text-align: start; text-indent: 0px; text-transform: none; = white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div class=3D"gmail_quote gmail_quote_container" = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none;"><div dir=3D"ltr" = class=3D"gmail_attr">On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei = Huang <<a href=3D"mailto:zlei@freebsd.org" = class=3D"">zlei@freebsd.org</a>> wrote:<br class=3D""></div><blockquote= class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; = border-left-width: 1px; border-left-style: solid; border-left-color: = rgb(204, 204, 204); padding-left: 1ex;">Hi Warner,<br class=3D""><br = class=3D"">Recently I upgraded some ESXi vms from 13.3 to 13.4 and = noticed weird report for sas speed.<br class=3D"">The boot console has = the following,<br class=3D""><br class=3D"">```<br class=3D"">da0 at = pvscsi0 bus 0 scbus2 target 0 lun 0<br class=3D"">da0: <VMware = Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device<br = class=3D"">da0: 4294967.295MB/s transfers<br class=3D"">```<br = class=3D"">But camcontrol report the correct value,<br class=3D"">```<br = class=3D""># camcontrol inquiry da0 -R<br class=3D"">pass1: 750.000MB/s = transfers, Command Queueing Enabled<br class=3D"">```<br class=3D""><br = class=3D"">The `4294967.295MB` is actually 0xffff_ffff or -1 but I do = not see any logic set those values.<br class=3D""><br class=3D"">Finally = I managed to get the stack trace,<br class=3D"">```<br = class=3D"">_scsi_announce_periph<br = class=3D"">scsi_announce_periph_sbuf<br = class=3D"">xpt_announce_periph_sbuf<br class=3D"">dadone_proberc<br = class=3D"">xpt_done_process<br class=3D"">xpt_done_td<br = class=3D"">fork_exit<br class=3D"">fork_trampoline<br class=3D"">```<br = class=3D""><br class=3D"">and noticed that the last param `cts` of = `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int = *freq, struct ccb_trans_settings *cts)`<br class=3D"">is from kernel = stack and is not properly initialized, latter I found some commits = related to this,<br class=3D""><br class=3D"">076686fe0703 cam: make = sure to clear CCBs allocated on the stack<br class=3D"">ec5325dbca62 = cam: make sure to clear even more CCBs allocated on the stack<br = class=3D"">0f206cc91279 cam: add missing zeroing of a stack-allocated = CCB.<br class=3D"">616a676a0535 cam: clear stack-allocated CCB in the = target layer<br class=3D""><br class=3D"">I applied them to stable/13, = rebuild and reboot, now the speed of da0 is reported correctly. I also = tried to patch the pvscsi driver with few lines and<br class=3D"">it = also works as intended.<br class=3D""><br class=3D"">```<br class=3D"">---= a/sys/dev/vmware/pvscsi/pvscsi.c<br class=3D"">+++ = b/sys/dev/vmware/pvscsi/pvscsi.c<br class=3D"">@@ -1444,6 +1444,10 @@ = pvscsi_action(struct cam_sim *sim, union ccb *ccb)<br class=3D""> = <span = class=3D"Apple-converted-space"> </span>cts->proto_specific.scsi.f= lags =3D CTS_SCSI_FLAGS_TAG_ENB;<br class=3D""> = <span = class=3D"Apple-converted-space"> </span>cts->proto_specific.scsi.v= alid =3D CTS_SCSI_VALID_TQ;<br class=3D""><br class=3D"">+ = /* Prefer connection speed over = sas port speed */<br class=3D"">+ = /* cts->xport_specific.sas.bitrate =3D 0; */<br = class=3D"">+ = cts->xport_specific.sas.valid =3D 0;<br class=3D"">+<br = class=3D""> <span = class=3D"Apple-converted-space"> </span>ccb_h->status =3D = CAM_REQ_CMP;<br class=3D""> = <span = class=3D"Apple-converted-space"> </span>xpt_done(ccb);<br = class=3D"">```<br class=3D""><br class=3D"">Things come clear and I know = why this weird speed happens, now it is time to decide how to fix it.<br = class=3D""><br class=3D"">Fixing the consumer of cam, aka pvscsi driver, = is quite simple and promising. I did a quick search it appears other = consumers set `cts->xport_specific.sas.valid` correctly. It does not = convince me as I'm quite new to cam subsystem.<br = class=3D""></blockquote><div class=3D""><br class=3D""></div><div = class=3D"">Yes. sas.valid is set when the sas.bitrate and other data has = been set correctly. Setting it to 0 like this ensures it's = ignored. So if you know the speed, set sas.bitrate to that speed = and sas.valid to 1.</div></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">That is clear.</div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = class=3D"gmail_quote gmail_quote_container" style=3D"caret-color: rgb(0, = 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; = font-variant-caps: normal; font-weight: 400; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div class=3D""><br class=3D""></div><div = class=3D"">I'm not sure I answered the question right, but if not, = please clarify or point out what I missed and I'll try again.</div><div = class=3D""> </div><blockquote class=3D"gmail_quote" style=3D"margin: = 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; = border-left-color: rgb(204, 204, 204); padding-left: 1ex;">Which one do = you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver = to stable/13 ?<br class=3D""></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">[[ Sorry for the delay, I missed this = all month ]]</div><div class=3D""><br class=3D""></div><div class=3D"">I = generally prefer a MFC, unless the code is no longer in -current.<span = class=3D"Apple-converted-space"> </span></div></div></div></blockquot= e><div class=3D""><br class=3D""></div><div class=3D"">The code live in = -current and all supported stable branches.</div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = class=3D"gmail_quote gmail_quote_container" style=3D"caret-color: rgb(0, = 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; = font-variant-caps: normal; font-weight: 400; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div class=3D"">Even if there's two different = fixes for this logical bug, fixing it in current, then MFCing that (with = the current hash) is fine, even if the stable/13 changes are completely = different.<span = class=3D"Apple-converted-space"> </span></div></div></div></blockquot= e><div class=3D""><br class=3D""></div><div class=3D"">The bug does not = exist in current and stable/14 but only in stable/13, due to Edward's = work ( commits those zero stack-allocated CCBs ) were not MFCed into = stable/13 branch.</div><div class=3D""><br class=3D""></div><blockquote = type=3D"cite" class=3D""><div class=3D""><div class=3D"gmail_quote = gmail_quote_container" style=3D"caret-color: rgb(0, 0, 0); font-family: = Helvetica; font-size: 13px; font-style: normal; font-variant-caps: = normal; font-weight: 400; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: = none;"><div class=3D"">For stable/13 I guess it matters a bit less than = stable/14 since I'll be merging to it less, but if it's a commit from = -current that doesn't need to be made to -stable because of the new = commit on stable, I tend to include the MFC hash = text.</div></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">Do you mean the `cherry picked from = commit` commit message ?</div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div class=3D"gmail_quote = gmail_quote_container" style=3D"caret-color: rgb(0, 0, 0); font-family: = Helvetica; font-size: 13px; font-style: normal; font-variant-caps: = normal; font-weight: 400; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: = none;"><div class=3D""><br class=3D""></div><div = class=3D"">Warner</div></div></div></blockquote></div><div class=3D"" = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none;"><br = class=3D""></div><div class=3D"" style=3D"caret-color: rgb(0, 0, 0); = font-family: Helvetica; font-size: 13px; font-style: normal; = font-variant-caps: normal; font-weight: 400; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;">I'm preparing and testing the MFCing. Bless me = to not make things messed up :)</div></div></blockquote><div = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><br = class=3D""></div><div style=3D"caret-color: rgb(0, 0, 0); font-family: = Helvetica; font-size: 13px; font-style: normal; font-variant-caps: = normal; font-weight: 400; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: = none;" class=3D"">And the individual fix for pvscsi is posted to <a = href=3D"https://reviews.freebsd.org/D48438" = class=3D"">https://reviews.freebsd.org/D48438</a> .</div></div></bloc= kquote><div><br class=3D""></div><div>Landed in -current. Will be MFCed = to stable/13 after a few days.</div><br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D""><br class=3D"" = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none;"><blockquote = type=3D"cite" class=3D"" style=3D"font-family: Helvetica; font-size: = 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; orphans: auto; text-align: start; text-indent: = 0px; text-transform: none; white-space: normal; widows: auto; = word-spacing: 0px; -webkit-text-size-adjust: auto; = -webkit-text-stroke-width: 0px; text-decoration: none;"><div = class=3D""><br class=3D"" style=3D"caret-color: rgb(0, 0, 0); = font-family: Helvetica; font-size: 13px; font-style: normal; = font-variant-caps: normal; font-weight: 400; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div class=3D"" style=3D"caret-color: rgb(0, 0, = 0); font-family: Helvetica; font-size: 13px; font-style: normal; = font-variant-caps: normal; font-weight: 400; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div class=3D"">Best regards,</div><div = class=3D"">Zhenlei</div></div></div></blockquote></div></blockquote></div>= <br class=3D""><div class=3D""> <div><br class=3D""></div> </div> <br class=3D""></body></html>= --Apple-Mail=_90451C18-3A2B-4A28-B5FE-5043B6553940--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C12BB5CB-ED16-4856-ADE1-6D788FB9799E>