Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Jan 2025 21:37:08 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Zhenlei Huang <zlei@freebsd.org>
Cc:        Warner Losh <imp@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org>,  Edward Tomasz Napierala <trasz@freebsd.org>
Subject:   Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver
Message-ID:  <CANCZdfqCy=TdPRGrTKj=7DJz9nSyPFZgBnsPuf4FqgurN4oYiw@mail.gmail.com>
In-Reply-To: <C12BB5CB-ED16-4856-ADE1-6D788FB9799E@FreeBSD.org>
References:  <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com> <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> <E39D3960-42CE-4290-AF31-EA385D8512AF@FreeBSD.org> <C12BB5CB-ED16-4856-ADE1-6D788FB9799E@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000a28822062bb73b88
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Great. This looks good.

Warner

On Tue, Jan 14, 2025, 9:36=E2=80=AFPM Zhenlei Huang <zlei@freebsd.org> wrot=
e:

>
>
> On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote:
>
>
>
> On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote:
>
>
>
> On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com> wrote:
>
>
>
> On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang <zlei@freebsd.org> w=
rote:
>
>> Hi Warner,
>>
>> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird
>> report for sas speed.
>> The boot console has the following,
>>
>> ```
>> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0
>> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
>> da0: 4294967.295MB/s transfers
>> ```
>> But camcontrol report the correct value,
>> ```
>> # camcontrol inquiry da0 -R
>> pass1: 750.000MB/s transfers, Command Queueing Enabled
>> ```
>>
>> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any
>> logic set those values.
>>
>> Finally I managed to get the stack trace,
>> ```
>> _scsi_announce_periph
>> scsi_announce_periph_sbuf
>> xpt_announce_periph_sbuf
>> dadone_proberc
>> xpt_done_process
>> xpt_done_td
>> fork_exit
>> fork_trampoline
>> ```
>>
>> and noticed that the last param `cts` of `_scsi_announce_periph(struct
>> cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings
>> *cts)`
>> is from kernel stack and is not properly initialized, latter I found som=
e
>> commits related to this,
>>
>> 076686fe0703 cam: make sure to clear CCBs allocated on the stack
>> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the sta=
ck
>> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
>> 616a676a0535 cam: clear stack-allocated CCB in the target layer
>>
>> I applied them to stable/13, rebuild and reboot, now the speed of da0 is
>> reported correctly. I also tried to patch the pvscsi driver with few lin=
es
>> and
>> it also works as intended.
>>
>> ```
>> --- a/sys/dev/vmware/pvscsi/pvscsi.c
>> +++ b/sys/dev/vmware/pvscsi/pvscsi.c
>> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb=
)
>>                 cts->proto_specific.scsi.flags =3D CTS_SCSI_FLAGS_TAG_EN=
B;
>>                 cts->proto_specific.scsi.valid =3D CTS_SCSI_VALID_TQ;
>>
>> +               /* Prefer connection speed over sas port speed */
>> +               /* cts->xport_specific.sas.bitrate =3D 0; */
>> +               cts->xport_specific.sas.valid =3D 0;
>> +
>>                 ccb_h->status =3D CAM_REQ_CMP;
>>                 xpt_done(ccb);
>> ```
>>
>> Things come clear and I know why this weird speed happens, now it is tim=
e
>> to decide how to fix it.
>>
>> Fixing the consumer of cam, aka pvscsi driver, is quite simple and
>> promising. I did a quick search it appears other consumers set
>> `cts->xport_specific.sas.valid` correctly. It does not convince me as I'=
m
>> quite new to cam subsystem.
>>
>
> Yes. sas.valid is set when the sas.bitrate and other data has been set
> correctly. Setting it to 0 like this ensures it's ignored.  So if you kno=
w
> the speed, set sas.bitrate to that speed and sas.valid to 1.
>
>
> That is clear.
>
>
> I'm not sure I answered the question right, but if not, please clarify or
> point out what I missed and I'll try again.
>
>
>> Which one do you prefer, MFC commits to stable/13, or do direct fix for
>> pvscsi driver to stable/13 ?
>>
>
> [[ Sorry for the delay, I missed this all month ]]
>
> I generally prefer a MFC, unless the code is no longer in -current.
>
>
> The code live in -current and all supported stable branches.
>
> Even if there's two different fixes for this logical bug, fixing it in
> current, then MFCing that (with the current hash) is fine, even if the
> stable/13 changes are completely different.
>
>
> The bug does not exist in current and stable/14 but only in stable/13, du=
e
> to Edward's work ( commits those zero stack-allocated CCBs ) were not MFC=
ed
> into stable/13 branch.
>
> For stable/13 I guess it matters a bit less than stable/14 since I'll be
> merging to it less, but if it's a commit from -current that doesn't need =
to
> be made to -stable because of the new commit on stable, I tend to include
> the MFC hash text.
>
>
> Do you mean the `cherry picked from commit` commit message ?
>
>
> Warner
>
>
> I'm preparing and testing the MFCing. Bless me to not make things messed
> up :)
>
>
> And the individual fix for pvscsi is posted to
> https://reviews.freebsd.org/D48438 .
>
>
> Landed in -current. Will be MFCed to stable/13 after a few days.
>
>
>
> Best regards,
> Zhenlei
>
>
>
>
>

--000000000000a28822062bb73b88
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">Great. This looks good.<div dir=3D"auto"><br></div><div d=
ir=3D"auto">Warner</div></div><br><div class=3D"gmail_quote gmail_quote_con=
tainer"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jan 14, 2025, 9:36=E2=
=80=AFPM Zhenlei Huang &lt;<a href=3D"mailto:zlei@freebsd.org">zlei@freebsd=
.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"w=
ord-wrap:break-word;line-break:after-white-space"><br><div><br><blockquote =
type=3D"cite"><div>On Jan 13, 2025, at 5:06 PM, Zhenlei Huang &lt;<a href=
=3D"mailto:zlei@FreeBSD.org" target=3D"_blank" rel=3D"noreferrer">zlei@Free=
BSD.org</a>&gt; wrote:</div><br><div><br><br style=3D"font-family:Helvetica=
;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;=
letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;=
white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type=
=3D"cite" style=3D"font-family:Helvetica;font-size:13px;font-style:normal;f=
ont-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:st=
art;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px=
;text-decoration:none"><div>On Jan 13, 2025, at 4:22 PM, Zhenlei Huang &lt;=
<a href=3D"mailto:zlei@FreeBSD.org" target=3D"_blank" rel=3D"noreferrer">zl=
ei@FreeBSD.org</a>&gt; wrote:</div><br><div><div style=3D"font-family:Helve=
tica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:=
400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:n=
one;white-space:normal;word-spacing:0px;text-decoration:none"><br><br><bloc=
kquote type=3D"cite"><div>On Dec 28, 2024, at 6:03 AM, Warner Losh &lt;<a h=
ref=3D"mailto:imp@bsdimp.com" target=3D"_blank" rel=3D"noreferrer">imp@bsdi=
mp.com</a>&gt; wrote:</div><br><div><br><br style=3D"font-family:Helvetica;=
font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;l=
etter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;w=
hite-space:normal;word-spacing:0px;text-decoration:none"><div class=3D"gmai=
l_quote" style=3D"font-family:Helvetica;font-size:13px;font-style:normal;fo=
nt-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:sta=
rt;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;=
text-decoration:none"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Dec 2, =
2024 at 2:41=E2=80=AFAM Zhenlei Huang &lt;<a href=3D"mailto:zlei@freebsd.or=
g" target=3D"_blank" rel=3D"noreferrer">zlei@freebsd.org</a>&gt; wrote:<br>=
</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b=
order-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,=
204);padding-left:1ex">Hi Warner,<br><br>Recently I upgraded some ESXi vms =
from 13.3 to 13.4 and noticed weird report for sas speed.<br>The boot conso=
le has the following,<br><br>```<br>da0 at pvscsi0 bus 0 scbus2 target 0 lu=
n 0<br>da0: &lt;VMware Virtual disk 2.0&gt; Fixed Direct Access SPC-4 SCSI =
device<br>da0: 4294967.295MB/s transfers<br>```<br>But camcontrol report th=
e correct value,<br>```<br># camcontrol inquiry da0 -R<br>pass1: 750.000MB/=
s transfers, Command Queueing Enabled<br>```<br><br>The `4294967.295MB` is =
actually 0xffff_ffff or -1 but I do not see any logic set those values.<br>=
<br>Finally I managed to get the stack trace,<br>```<br>_scsi_announce_peri=
ph<br>scsi_announce_periph_sbuf<br>xpt_announce_periph_sbuf<br>dadone_probe=
rc<br>xpt_done_process<br>xpt_done_td<br>fork_exit<br>fork_trampoline<br>``=
`<br><br>and noticed that the last param `cts` of `_scsi_announce_periph(st=
ruct cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settin=
gs *cts)`<br>is from kernel stack and is not properly initialized, latter I=
 found some commits related to this,<br><br>076686fe0703 cam: make sure to =
clear CCBs allocated on the stack<br>ec5325dbca62 cam: make sure to clear e=
ven more CCBs allocated on the stack<br>0f206cc91279 cam: add missing zeroi=
ng of a stack-allocated CCB.<br>616a676a0535 cam: clear stack-allocated CCB=
 in the target layer<br><br>I applied them to stable/13, rebuild and reboot=
, now the speed of da0 is reported correctly. I also tried to patch the pvs=
csi driver with few lines and<br>it also works as intended.<br><br>```<br>-=
-- a/sys/dev/vmware/pvscsi/pvscsi.c<br>+++ b/sys/dev/vmware/pvscsi/pvscsi.c=
<br>@@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *cc=
b)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0</=
span>cts-&gt;proto_specific.scsi.flags =3D CTS_SCSI_FLAGS_TAG_ENB;<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0</span>cts-=
&gt;proto_specific.scsi.valid =3D CTS_SCSI_VALID_TQ;<br><br>+=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* Prefer connection speed over sa=
s port speed */<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
/* cts-&gt;xport_specific.sas.bitrate =3D 0; */<br>+=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cts-&gt;xport_specific.sas.valid =3D 0;<b=
r>+<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0<=
/span>ccb_h-&gt;status =3D CAM_REQ_CMP;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0<span>=C2=A0</span>xpt_done(ccb);<br>```<br><br>Thi=
ngs come clear and I know why this weird speed happens, now it is time to d=
ecide how to fix it.<br><br>Fixing the consumer of cam, aka pvscsi driver, =
is quite simple and promising. I did a quick search it appears other consum=
ers set `cts-&gt;xport_specific.sas.valid` correctly. It does not convince =
me as I&#39;m quite new to cam subsystem.<br></blockquote><div><br></div><d=
iv>Yes. sas.valid is set when the sas.bitrate and other data has been set c=
orrectly. Setting it to 0 like this ensures it&#39;s ignored.=C2=A0 So if y=
ou know the speed, set sas.bitrate to that speed and sas.valid to 1.</div><=
/div></div></blockquote><div><br></div><div>That is clear.</div><br><blockq=
uote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font-family:Hel=
vetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weigh=
t:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform=
:none;white-space:normal;word-spacing:0px;text-decoration:none"><div><br></=
div><div>I&#39;m not sure I answered the question right, but if not, please=
 clarify or point out what I missed and I&#39;ll try again.</div><div>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,2=
04,204);padding-left:1ex">Which one do you prefer, MFC commits to stable/13=
, or do direct fix for pvscsi driver to stable/13 ?<br></blockquote><div><b=
r></div><div>[[ Sorry for the delay, I missed this all month ]]</div><div><=
br></div><div>I generally prefer a MFC, unless the code is no longer in -cu=
rrent.<span>=C2=A0</span></div></div></div></blockquote><div><br></div><div=
>The code live in -current and all supported stable branches.</div><br><blo=
ckquote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font-family:=
Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-we=
ight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transf=
orm:none;white-space:normal;word-spacing:0px;text-decoration:none"><div>Eve=
n if there&#39;s two different fixes for this logical bug, fixing it in cur=
rent, then MFCing that (with the current hash) is fine, even if the stable/=
13 changes are completely different.<span>=C2=A0</span></div></div></div></=
blockquote><div><br></div><div>The bug does not exist in current and stable=
/14 but only in stable/13, due to Edward&#39;s work ( commits those zero st=
ack-allocated CCBs ) were not MFCed into stable/13 branch.</div><div><br></=
div><blockquote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font=
-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal=
;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;tex=
t-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">=
<div>For stable/13 I guess it matters a bit less than stable/14 since I&#39=
;ll be merging to it less, but if it&#39;s a commit from -current that does=
n&#39;t need to be made to -stable because of the new commit on stable, I t=
end to include the MFC hash text.</div></div></div></blockquote><div><br></=
div><div>Do you mean the `cherry picked from commit` commit message ?</div>=
<br><blockquote type=3D"cite"><div><div class=3D"gmail_quote" style=3D"font=
-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal=
;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;tex=
t-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">=
<div><br></div><div>Warner</div></div></div></blockquote></div><div style=
=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-variant-cap=
s:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent=
:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoratio=
n:none"><br></div><div style=3D"font-family:Helvetica;font-size:13px;font-s=
tyle:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;=
text-align:start;text-indent:0px;text-transform:none;white-space:normal;wor=
d-spacing:0px;text-decoration:none">I&#39;m preparing and testing the MFCin=
g. Bless me to not make things messed up :)</div></div></blockquote><div st=
yle=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-variant-=
caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-ind=
ent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decora=
tion:none"><br></div><div style=3D"font-family:Helvetica;font-size:13px;fon=
t-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:norm=
al;text-align:start;text-indent:0px;text-transform:none;white-space:normal;=
word-spacing:0px;text-decoration:none">And the individual fix for pvscsi is=
 posted to=C2=A0<a href=3D"https://reviews.freebsd.org/D48438" target=3D"_b=
lank" rel=3D"noreferrer">https://reviews.freebsd.org/D48438</a>=C2=A0.</div=
></div></blockquote><div><br></div><div>Landed in -current. Will be MFCed t=
o stable/13 after a few days.</div><br><blockquote type=3D"cite"><div><br s=
tyle=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-variant=
-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-in=
dent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decor=
ation:none"><blockquote type=3D"cite" style=3D"font-family:Helvetica;font-s=
ize:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-=
spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-s=
pace:normal;word-spacing:0px;text-decoration:none"><div><br style=3D"font-f=
amily:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;f=
ont-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-=
transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><d=
iv style=3D"font-family:Helvetica;font-size:13px;font-style:normal;font-var=
iant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;tex=
t-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-d=
ecoration:none"><div>Best regards,</div><div>Zhenlei</div></div></div></blo=
ckquote></div></blockquote></div><br><div>
<div><br></div>

</div>
<br></div></blockquote></div>

--000000000000a28822062bb73b88--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqCy=TdPRGrTKj=7DJz9nSyPFZgBnsPuf4FqgurN4oYiw>