Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Jan 2025 20:48:47 +0800
From:      Zhenlei Huang <zlei@FreeBSD.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Warner Losh <imp@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org>, Edward Tomasz Napierala <trasz@freebsd.org>
Subject:   Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver
Message-ID:  <56C24758-E34A-4A68-9E1D-8F7D27ADD0DF@FreeBSD.org>
In-Reply-To: <CANCZdfqCy=TdPRGrTKj=7DJz9nSyPFZgBnsPuf4FqgurN4oYiw@mail.gmail.com>
References:  <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com> <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> <E39D3960-42CE-4290-AF31-EA385D8512AF@FreeBSD.org> <C12BB5CB-ED16-4856-ADE1-6D788FB9799E@FreeBSD.org> <CANCZdfqCy=TdPRGrTKj=7DJz9nSyPFZgBnsPuf4FqgurN4oYiw@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]


> On Jan 15, 2025, at 12:37 PM, Warner Losh <imp@bsdimp.com> wrote:
> 
> Great. This looks good.

MFSing is done, along with the following commits those clear the stack allocated CCBs.

076686fe0703 cam: make sure to clear CCBs allocated on the stack
ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack
0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
616a676a0535 cam: clear stack-allocated CCB in the target layer

Everything looks good so far.

Best regards,
Zhenlei

> 
> Warner
> 
> On Tue, Jan 14, 2025, 9:36 PM Zhenlei Huang <zlei@freebsd.org <mailto:zlei@freebsd.org>> wrote:
> 
> 
>> On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>> wrote:
>> 
>> 
>> 
>>> On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>> wrote:
>>> 
>>> 
>>> 
>>>> On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com <mailto:imp@bsdimp.com>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang <zlei@freebsd.org <mailto:zlei@freebsd.org>> wrote:
>>>> Hi Warner,
>>>> 
>>>> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report for sas speed.
>>>> The boot console has the following,
>>>> 
>>>> ```
>>>> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0
>>>> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
>>>> da0: 4294967.295MB/s transfers
>>>> ```
>>>> But camcontrol report the correct value,
>>>> ```
>>>> # camcontrol inquiry da0 -R
>>>> pass1: 750.000MB/s transfers, Command Queueing Enabled
>>>> ```
>>>> 
>>>> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any logic set those values.
>>>> 
>>>> Finally I managed to get the stack trace,
>>>> ```
>>>> _scsi_announce_periph
>>>> scsi_announce_periph_sbuf
>>>> xpt_announce_periph_sbuf
>>>> dadone_proberc
>>>> xpt_done_process
>>>> xpt_done_td
>>>> fork_exit
>>>> fork_trampoline
>>>> ```
>>>> 
>>>> and noticed that the last param `cts` of `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings *cts)`
>>>> is from kernel stack and is not properly initialized, latter I found some commits related to this,
>>>> 
>>>> 076686fe0703 cam: make sure to clear CCBs allocated on the stack
>>>> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack
>>>> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.
>>>> 616a676a0535 cam: clear stack-allocated CCB in the target layer
>>>> 
>>>> I applied them to stable/13, rebuild and reboot, now the speed of da0 is reported correctly. I also tried to patch the pvscsi driver with few lines and
>>>> it also works as intended.
>>>> 
>>>> ```
>>>> --- a/sys/dev/vmware/pvscsi/pvscsi.c
>>>> +++ b/sys/dev/vmware/pvscsi/pvscsi.c
>>>> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb)
>>>>                 cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB;
>>>>                 cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ;
>>>> 
>>>> +               /* Prefer connection speed over sas port speed */
>>>> +               /* cts->xport_specific.sas.bitrate = 0; */
>>>> +               cts->xport_specific.sas.valid = 0;
>>>> +
>>>>                 ccb_h->status = CAM_REQ_CMP;
>>>>                 xpt_done(ccb);
>>>> ```
>>>> 
>>>> Things come clear and I know why this weird speed happens, now it is time to decide how to fix it.
>>>> 
>>>> Fixing the consumer of cam, aka pvscsi driver, is quite simple and promising. I did a quick search it appears other consumers set `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm quite new to cam subsystem.
>>>> 
>>>> Yes. sas.valid is set when the sas.bitrate and other data has been set correctly. Setting it to 0 like this ensures it's ignored.  So if you know the speed, set sas.bitrate to that speed and sas.valid to 1.
>>> 
>>> That is clear.
>>> 
>>>> 
>>>> I'm not sure I answered the question right, but if not, please clarify or point out what I missed and I'll try again.
>>>>  
>>>> Which one do you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver to stable/13 ?
>>>> 
>>>> [[ Sorry for the delay, I missed this all month ]]
>>>> 
>>>> I generally prefer a MFC, unless the code is no longer in -current. 
>>> 
>>> The code live in -current and all supported stable branches.
>>> 
>>>> Even if there's two different fixes for this logical bug, fixing it in current, then MFCing that (with the current hash) is fine, even if the stable/13 changes are completely different. 
>>> 
>>> The bug does not exist in current and stable/14 but only in stable/13, due to Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed into stable/13 branch.
>>> 
>>>> For stable/13 I guess it matters a bit less than stable/14 since I'll be merging to it less, but if it's a commit from -current that doesn't need to be made to -stable because of the new commit on stable, I tend to include the MFC hash text.
>>> 
>>> Do you mean the `cherry picked from commit` commit message ?
>>> 
>>>> 
>>>> Warner
>>> 
>>> 
>>> I'm preparing and testing the MFCing. Bless me to not make things messed up :)
>> 
>> And the individual fix for pvscsi is posted to https://reviews.freebsd.org/D48438 <https://reviews.freebsd.org/D48438>; .
> 
> Landed in -current. Will be MFCed to stable/13 after a few days.
> 
>> 
>>> 
>>> Best regards,
>>> Zhenlei
> 
> 
> 




[-- Attachment #2 --]
<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jan 15, 2025, at 12:37 PM, Warner Losh &lt;<a href="mailto:imp@bsdimp.com" class="">imp@bsdimp.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="auto" class="">Great. This looks good.</div></div></blockquote><div><br class=""></div><div>MFSing is done, along with the following commits those clear the stack allocated CCBs.</div><div><br class=""></div><div><div>076686fe0703 cam: make sure to clear CCBs allocated on the stack</div><div>ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack</div><div>0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.</div><div>616a676a0535 cam: clear stack-allocated CCB in the target layer</div><div><br class=""></div><div>Everything looks good so far.</div><div><br class=""></div><div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Best regards,</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Zhenlei</div></div></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Warner</div></div><br class=""><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Jan 14, 2025, 9:36 PM Zhenlei Huang &lt;<a href="mailto:zlei@freebsd.org" class="">zlei@freebsd.org</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 13, 2025, at 5:06 PM, Zhenlei Huang &lt;<a href="mailto:zlei@FreeBSD.org" target="_blank" rel="noreferrer" class="">zlei@FreeBSD.org</a>&gt; wrote:</div><br class=""><div class=""><br class=""><br style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class="">On Jan 13, 2025, at 4:22 PM, Zhenlei Huang &lt;<a href="mailto:zlei@FreeBSD.org" target="_blank" rel="noreferrer" class="">zlei@FreeBSD.org</a>&gt; wrote:</div><br class=""><div class=""><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br class=""><br class=""><blockquote type="cite" class=""><div class="">On Dec 28, 2024, at 6:03 AM, Warner Losh &lt;<a href="mailto:imp@bsdimp.com" target="_blank" rel="noreferrer" class="">imp@bsdimp.com</a>&gt; wrote:</div><br class=""><div class=""><br class=""><br style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div dir="ltr" class="gmail_attr">On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang &lt;<a href="mailto:zlei@freebsd.org" target="_blank" rel="noreferrer" class="">zlei@freebsd.org</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi Warner,<br class=""><br class="">Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report for sas speed.<br class="">The boot console has the following,<br class=""><br class="">```<br class="">da0 at pvscsi0 bus 0 scbus2 target 0 lun 0<br class="">da0: &lt;VMware Virtual disk 2.0&gt; Fixed Direct Access SPC-4 SCSI device<br class="">da0: 4294967.295MB/s transfers<br class="">```<br class="">But camcontrol report the correct value,<br class="">```<br class=""># camcontrol inquiry da0 -R<br class="">pass1: 750.000MB/s transfers, Command Queueing Enabled<br class="">```<br class=""><br class="">The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any logic set those values.<br class=""><br class="">Finally I managed to get the stack trace,<br class="">```<br class="">_scsi_announce_periph<br class="">scsi_announce_periph_sbuf<br class="">xpt_announce_periph_sbuf<br class="">dadone_proberc<br class="">xpt_done_process<br class="">xpt_done_td<br class="">fork_exit<br class="">fork_trampoline<br class="">```<br class=""><br class="">and noticed that the last param `cts` of `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings *cts)`<br class="">is from kernel stack and is not properly initialized, latter I found some commits related to this,<br class=""><br class="">076686fe0703 cam: make sure to clear CCBs allocated on the stack<br class="">ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack<br class="">0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.<br class="">616a676a0535 cam: clear stack-allocated CCB in the target layer<br class=""><br class="">I applied them to stable/13, rebuild and reboot, now the speed of da0 is reported correctly. I also tried to patch the pvscsi driver with few lines and<br class="">it also works as intended.<br class=""><br class="">```<br class="">--- a/sys/dev/vmware/pvscsi/pvscsi.c<br class="">+++ b/sys/dev/vmware/pvscsi/pvscsi.c<br class="">@@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb)<br class="">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="">&nbsp;</span>cts-&gt;proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB;<br class="">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="">&nbsp;</span>cts-&gt;proto_specific.scsi.valid = CTS_SCSI_VALID_TQ;<br class=""><br class="">+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* Prefer connection speed over sas port speed */<br class="">+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* cts-&gt;xport_specific.sas.bitrate = 0; */<br class="">+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cts-&gt;xport_specific.sas.valid = 0;<br class="">+<br class="">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="">&nbsp;</span>ccb_h-&gt;status = CAM_REQ_CMP;<br class="">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="">&nbsp;</span>xpt_done(ccb);<br class="">```<br class=""><br class="">Things come clear and I know why this weird speed happens, now it is time to decide how to fix it.<br class=""><br class="">Fixing the consumer of cam, aka pvscsi driver, is quite simple and promising. I did a quick search it appears other consumers set `cts-&gt;xport_specific.sas.valid` correctly. It does not convince me as I'm quite new to cam subsystem.<br class=""></blockquote><div class=""><br class=""></div><div class="">Yes. sas.valid is set when the sas.bitrate and other data has been set correctly. Setting it to 0 like this ensures it's ignored.&nbsp; So if you know the speed, set sas.bitrate to that speed and sas.valid to 1.</div></div></div></blockquote><div class=""><br class=""></div><div class="">That is clear.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div class=""><br class=""></div><div class="">I'm not sure I answered the question right, but if not, please clarify or point out what I missed and I'll try again.</div><div class="">&nbsp;</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Which one do you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver to stable/13 ?<br class=""></blockquote><div class=""><br class=""></div><div class="">[[ Sorry for the delay, I missed this all month ]]</div><div class=""><br class=""></div><div class="">I generally prefer a MFC, unless the code is no longer in -current.<span class="">&nbsp;</span></div></div></div></blockquote><div class=""><br class=""></div><div class="">The code live in -current and all supported stable branches.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div class="">Even if there's two different fixes for this logical bug, fixing it in current, then MFCing that (with the current hash) is fine, even if the stable/13 changes are completely different.<span class="">&nbsp;</span></div></div></div></blockquote><div class=""><br class=""></div><div class="">The bug does not exist in current and stable/14 but only in stable/13, due to Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed into stable/13 branch.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div class="">For stable/13 I guess it matters a bit less than stable/14 since I'll be merging to it less, but if it's a commit from -current that doesn't need to be made to -stable because of the new commit on stable, I tend to include the MFC hash text.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Do you mean the `cherry picked from commit` commit message ?</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div class=""><br class=""></div><div class="">Warner</div></div></div></blockquote></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br class=""></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class="">I'm preparing and testing the MFCing. Bless me to not make things messed up :)</div></div></blockquote><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br class=""></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class="">And the individual fix for pvscsi is posted to&nbsp;<a href="https://reviews.freebsd.org/D48438" target="_blank" rel="noreferrer" class="">https://reviews.freebsd.org/D48438</a>&nbsp;.</div></div></blockquote><div class=""><br class=""></div><div class="">Landed in -current. Will be MFCed to stable/13 after a few days.</div><br class=""><blockquote type="cite" class=""><div class=""><br style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><blockquote type="cite" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class=""><br style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class="">Best regards,</div><div class="">Zhenlei</div></div></div></blockquote></div></blockquote></div><br class=""><div class="">
<div class=""><br class=""></div>

</div>
<br class=""></div></blockquote></div>
</div></blockquote></div><br class=""><div class="">
<div><br class=""></div>

</div>
<br class=""></body></html>
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56C24758-E34A-4A68-9E1D-8F7D27ADD0DF>