Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Jan 2025 16:22:51 +0800
From:      Zhenlei Huang <zlei@FreeBSD.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Warner Losh <imp@freebsd.org>, stable@freebsd.org, Edward Tomasz Napierala <trasz@freebsd.org>
Subject:   Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver
Message-ID:  <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org>
In-Reply-To: <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com>
References:  <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8



> On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com> wrote:
>=20
>=20
>=20
> On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang <zlei@freebsd.org =
<mailto:zlei@freebsd.org>> wrote:
> Hi Warner,
>=20
> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird =
report for sas speed.
> The boot console has the following,
>=20
> ```
> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0
> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
> da0: 4294967.295MB/s transfers
> ```
> But camcontrol report the correct value,
> ```
> # camcontrol inquiry da0 -R
> pass1: 750.000MB/s transfers, Command Queueing Enabled
> ```
>=20
> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any =
logic set those values.
>=20
> Finally I managed to get the stack trace,
> ```
> _scsi_announce_periph
> scsi_announce_periph_sbuf
> xpt_announce_periph_sbuf
> dadone_proberc
> xpt_done_process
> xpt_done_td
> fork_exit
> fork_trampoline
> ```
>=20
> and noticed that the last param `cts` of `_scsi_announce_periph(struct =
cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings =
*cts)`
> is from kernel stack and is not properly initialized, latter I found =
some commits related to this,
>=20
> 076686fe0703 cam: make sure to clear CCBs allocated on the stack
> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the =
stack
> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
> 616a676a0535 cam: clear stack-allocated CCB in the target layer
>=20
> I applied them to stable/13, rebuild and reboot, now the speed of da0 =
is reported correctly. I also tried to patch the pvscsi driver with few =
lines and
> it also works as intended.
>=20
> ```
> --- a/sys/dev/vmware/pvscsi/pvscsi.c
> +++ b/sys/dev/vmware/pvscsi/pvscsi.c
> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb =
*ccb)
>                 cts->proto_specific.scsi.flags =3D =
CTS_SCSI_FLAGS_TAG_ENB;
>                 cts->proto_specific.scsi.valid =3D CTS_SCSI_VALID_TQ;
>=20
> +               /* Prefer connection speed over sas port speed */
> +               /* cts->xport_specific.sas.bitrate =3D 0; */
> +               cts->xport_specific.sas.valid =3D 0;
> +
>                 ccb_h->status =3D CAM_REQ_CMP;
>                 xpt_done(ccb);
> ```
>=20
> Things come clear and I know why this weird speed happens, now it is =
time to decide how to fix it.
>=20
> Fixing the consumer of cam, aka pvscsi driver, is quite simple and =
promising. I did a quick search it appears other consumers set =
`cts->xport_specific.sas.valid` correctly. It does not convince me as =
I'm quite new to cam subsystem.
>=20
> Yes. sas.valid is set when the sas.bitrate and other data has been set =
correctly. Setting it to 0 like this ensures it's ignored.  So if you =
know the speed, set sas.bitrate to that speed and sas.valid to 1.

That is clear.

>=20
> I'm not sure I answered the question right, but if not, please clarify =
or point out what I missed and I'll try again.
> =20
> Which one do you prefer, MFC commits to stable/13, or do direct fix =
for pvscsi driver to stable/13 ?
>=20
> [[ Sorry for the delay, I missed this all month ]]
>=20
> I generally prefer a MFC, unless the code is no longer in -current.

The code live in -current and all supported stable branches.

> Even if there's two different fixes for this logical bug, fixing it in =
current, then MFCing that (with the current hash) is fine, even if the =
stable/13 changes are completely different.

The bug does not exist in current and stable/14 but only in stable/13, =
due to Edward's work ( commits those zero stack-allocated CCBs ) were =
not MFCed into stable/13 branch.

> For stable/13 I guess it matters a bit less than stable/14 since I'll =
be merging to it less, but if it's a commit from -current that doesn't =
need to be made to -stable because of the new commit on stable, I tend =
to include the MFC hash text.

Do you mean the `cherry picked from commit` commit message ?

>=20
> Warner


I'm preparing and testing the MFCing. Bless me to not make things messed =
up :)

Best regards,
Zhenlei


--Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br =
class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D"">On Dec 28, 2024, at 6:03 AM, Warner Losh &lt;<a =
href=3D"mailto:imp@bsdimp.com" class=3D"">imp@bsdimp.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><meta =
charset=3D"UTF-8" class=3D""><br class=3D"Apple-interchange-newline"><br =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
13px; font-style: normal; font-variant-caps: normal; font-weight: 400; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><div =
class=3D"gmail_quote gmail_quote_container" style=3D"caret-color: rgb(0, =
0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, =
Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang &lt;<a =
href=3D"mailto:zlei@freebsd.org" class=3D"">zlei@freebsd.org</a>&gt; =
wrote:<br class=3D""></div><blockquote class=3D"gmail_quote" =
style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: 1px; =
border-left-style: solid; border-left-color: rgb(204, 204, 204); =
padding-left: 1ex;">Hi Warner,<br class=3D""><br class=3D"">Recently I =
upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report for =
sas speed.<br class=3D"">The boot console has the following,<br =
class=3D""><br class=3D"">```<br class=3D"">da0 at pvscsi0 bus 0 scbus2 =
target 0 lun 0<br class=3D"">da0: &lt;VMware Virtual disk 2.0&gt; Fixed =
Direct Access SPC-4 SCSI device<br class=3D"">da0: 4294967.295MB/s =
transfers<br class=3D"">```<br class=3D"">But camcontrol report the =
correct value,<br class=3D"">```<br class=3D""># camcontrol inquiry da0 =
-R<br class=3D"">pass1: 750.000MB/s transfers, Command Queueing =
Enabled<br class=3D"">```<br class=3D""><br class=3D"">The =
`4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any logic =
set those values.<br class=3D""><br class=3D"">Finally I managed to get =
the stack trace,<br class=3D"">```<br class=3D"">_scsi_announce_periph<br =
class=3D"">scsi_announce_periph_sbuf<br =
class=3D"">xpt_announce_periph_sbuf<br class=3D"">dadone_proberc<br =
class=3D"">xpt_done_process<br class=3D"">xpt_done_td<br =
class=3D"">fork_exit<br class=3D"">fork_trampoline<br class=3D"">```<br =
class=3D""><br class=3D"">and noticed that the last param `cts` of =
`_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int =
*freq, struct ccb_trans_settings *cts)`<br class=3D"">is from kernel =
stack and is not properly initialized, latter I found some commits =
related to this,<br class=3D""><br class=3D"">076686fe0703 cam: make =
sure to clear CCBs allocated on the stack<br class=3D"">ec5325dbca62 =
cam: make sure to clear even more CCBs allocated on the stack<br =
class=3D"">0f206cc91279 cam: add missing zeroing of a stack-allocated =
CCB.<br class=3D"">616a676a0535 cam: clear stack-allocated CCB in the =
target layer<br class=3D""><br class=3D"">I applied them to stable/13, =
rebuild and reboot, now the speed of da0 is reported correctly. I also =
tried to patch the pvscsi driver with few lines and<br class=3D"">it =
also works as intended.<br class=3D""><br class=3D"">```<br class=3D"">---=
 a/sys/dev/vmware/pvscsi/pvscsi.c<br class=3D"">+++ =
b/sys/dev/vmware/pvscsi/pvscsi.c<br class=3D"">@@ -1444,6 +1444,10 @@ =
pvscsi_action(struct cam_sim *sim, union ccb *ccb)<br class=3D"">&nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>cts-&gt;proto_specific.scsi.f=
lags =3D CTS_SCSI_FLAGS_TAG_ENB;<br class=3D"">&nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>cts-&gt;proto_specific.scsi.v=
alid =3D CTS_SCSI_VALID_TQ;<br class=3D""><br class=3D"">+&nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* Prefer connection speed over =
sas port speed */<br class=3D"">+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;/* cts-&gt;xport_specific.sas.bitrate =3D 0; */<br =
class=3D"">+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;cts-&gt;xport_specific.sas.valid =3D 0;<br class=3D"">+<br =
class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>ccb_h-&gt;status =3D =
CAM_REQ_CMP;<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>xpt_done(ccb);<br =
class=3D"">```<br class=3D""><br class=3D"">Things come clear and I know =
why this weird speed happens, now it is time to decide how to fix it.<br =
class=3D""><br class=3D"">Fixing the consumer of cam, aka pvscsi driver, =
is quite simple and promising. I did a quick search it appears other =
consumers set `cts-&gt;xport_specific.sas.valid` correctly. It does not =
convince me as I'm quite new to cam subsystem.<br =
class=3D""></blockquote><div class=3D""><br class=3D""></div><div =
class=3D"">Yes. sas.valid is set when the sas.bitrate and other data has =
been set correctly. Setting it to 0 like this ensures it's =
ignored.&nbsp; So if you know the speed, set sas.bitrate to that speed =
and sas.valid to 1.</div></div></div></blockquote><div><br =
class=3D""></div><div>That is clear.</div><br class=3D""><blockquote =
type=3D"cite" class=3D""><div class=3D""><div class=3D"gmail_quote =
gmail_quote_container" style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 13px; font-style: normal; font-variant-caps: =
normal; font-weight: 400; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;"><div class=3D""><br class=3D""></div><div class=3D"">I'm not sure =
I answered the question right, but if not, please clarify or point out =
what I missed and I'll try again.</div><div =
class=3D"">&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"margin: =
0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; =
border-left-color: rgb(204, 204, 204); padding-left: 1ex;">Which one do =
you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver =
to stable/13 ?<br class=3D""></blockquote><div class=3D""><br =
class=3D""></div><div class=3D"">[[ Sorry for the delay, I missed this =
all month ]]</div><div class=3D""><br class=3D""></div><div class=3D"">I =
generally prefer a MFC, unless the code is no longer in -current. =
</div></div></div></blockquote><div><br class=3D""></div><div>The code =
live in -current and all supported stable branches.</div><br =
class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div =
class=3D"gmail_quote gmail_quote_container" style=3D"caret-color: rgb(0, =
0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><div class=3D"">Even if there's two different =
fixes for this logical bug, fixing it in current, then MFCing that (with =
the current hash) is fine, even if the stable/13 changes are completely =
different. </div></div></div></blockquote><div><br =
class=3D""></div><div>The bug does not exist in current and stable/14 =
but only in stable/13, due to Edward's work ( commits those zero =
stack-allocated CCBs ) were not MFCed into stable/13 =
branch.</div><div><br class=3D""></div><blockquote type=3D"cite" =
class=3D""><div class=3D""><div class=3D"gmail_quote =
gmail_quote_container" style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 13px; font-style: normal; font-variant-caps: =
normal; font-weight: 400; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;"><div class=3D"">For stable/13 I guess it matters a bit less than =
stable/14 since I'll be merging to it less, but if it's a commit from =
-current that doesn't need to be made to -stable because of the new =
commit on stable, I tend to include the MFC hash =
text.</div></div></div></blockquote><div><br class=3D""></div><div>Do =
you mean the `cherry picked from commit` commit message ?</div><br =
class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div =
class=3D"gmail_quote gmail_quote_container" style=3D"caret-color: rgb(0, =
0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><div class=3D""><br class=3D""></div><div =
class=3D"">Warner</div></div></div></blockquote></div><div class=3D""><br =
class=3D""></div><div class=3D"">I'm preparing and testing the MFCing. =
Bless me to not make things messed up :)</div><br class=3D""><div =
class=3D"">
<div>Best regards,</div><div>Zhenlei</div>

</div>
<br class=3D""></body></html>=

--Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CB8230B-7938-4503-AADD-7F691482908C>