Date: Mon, 13 Jan 2025 16:22:51 +0800 From: Zhenlei Huang <zlei@FreeBSD.org> To: Warner Losh <imp@bsdimp.com> Cc: Warner Losh <imp@freebsd.org>, stable@freebsd.org, Edward Tomasz Napierala <trasz@freebsd.org> Subject: Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver Message-ID: <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> In-Reply-To: <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com> References: <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <CANCZdfpKGMFyPAVqxPcahZXgSL=tFBin3ratjoQTtJrDeM2NUg@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] > On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com> wrote: > > > > On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang <zlei@freebsd.org <mailto:zlei@freebsd.org>> wrote: > Hi Warner, > > Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report for sas speed. > The boot console has the following, > > ``` > da0 at pvscsi0 bus 0 scbus2 target 0 lun 0 > da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device > da0: 4294967.295MB/s transfers > ``` > But camcontrol report the correct value, > ``` > # camcontrol inquiry da0 -R > pass1: 750.000MB/s transfers, Command Queueing Enabled > ``` > > The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any logic set those values. > > Finally I managed to get the stack trace, > ``` > _scsi_announce_periph > scsi_announce_periph_sbuf > xpt_announce_periph_sbuf > dadone_proberc > xpt_done_process > xpt_done_td > fork_exit > fork_trampoline > ``` > > and noticed that the last param `cts` of `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings *cts)` > is from kernel stack and is not properly initialized, latter I found some commits related to this, > > 076686fe0703 cam: make sure to clear CCBs allocated on the stack > ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack > 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. > 616a676a0535 cam: clear stack-allocated CCB in the target layer > > I applied them to stable/13, rebuild and reboot, now the speed of da0 is reported correctly. I also tried to patch the pvscsi driver with few lines and > it also works as intended. > > ``` > --- a/sys/dev/vmware/pvscsi/pvscsi.c > +++ b/sys/dev/vmware/pvscsi/pvscsi.c > @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb) > cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB; > cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ; > > + /* Prefer connection speed over sas port speed */ > + /* cts->xport_specific.sas.bitrate = 0; */ > + cts->xport_specific.sas.valid = 0; > + > ccb_h->status = CAM_REQ_CMP; > xpt_done(ccb); > ``` > > Things come clear and I know why this weird speed happens, now it is time to decide how to fix it. > > Fixing the consumer of cam, aka pvscsi driver, is quite simple and promising. I did a quick search it appears other consumers set `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm quite new to cam subsystem. > > Yes. sas.valid is set when the sas.bitrate and other data has been set correctly. Setting it to 0 like this ensures it's ignored. So if you know the speed, set sas.bitrate to that speed and sas.valid to 1. That is clear. > > I'm not sure I answered the question right, but if not, please clarify or point out what I missed and I'll try again. > > Which one do you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver to stable/13 ? > > [[ Sorry for the delay, I missed this all month ]] > > I generally prefer a MFC, unless the code is no longer in -current. The code live in -current and all supported stable branches. > Even if there's two different fixes for this logical bug, fixing it in current, then MFCing that (with the current hash) is fine, even if the stable/13 changes are completely different. The bug does not exist in current and stable/14 but only in stable/13, due to Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed into stable/13 branch. > For stable/13 I guess it matters a bit less than stable/14 since I'll be merging to it less, but if it's a commit from -current that doesn't need to be made to -stable because of the new commit on stable, I tend to include the MFC hash text. Do you mean the `cherry picked from commit` commit message ? > > Warner I'm preparing and testing the MFCing. Bless me to not make things messed up :) Best regards, Zhenlei [-- Attachment #2 --] <html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Dec 28, 2024, at 6:03 AM, Warner Losh <<a href="mailto:imp@bsdimp.com" class="">imp@bsdimp.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><br class="Apple-interchange-newline"><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div class="gmail_quote gmail_quote_container" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div dir="ltr" class="gmail_attr">On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang <<a href="mailto:zlei@freebsd.org" class="">zlei@freebsd.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;">Hi Warner,<br class=""><br class="">Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report for sas speed.<br class="">The boot console has the following,<br class=""><br class="">```<br class="">da0 at pvscsi0 bus 0 scbus2 target 0 lun 0<br class="">da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device<br class="">da0: 4294967.295MB/s transfers<br class="">```<br class="">But camcontrol report the correct value,<br class="">```<br class=""># camcontrol inquiry da0 -R<br class="">pass1: 750.000MB/s transfers, Command Queueing Enabled<br class="">```<br class=""><br class="">The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any logic set those values.<br class=""><br class="">Finally I managed to get the stack trace,<br class="">```<br class="">_scsi_announce_periph<br class="">scsi_announce_periph_sbuf<br class="">xpt_announce_periph_sbuf<br class="">dadone_proberc<br class="">xpt_done_process<br class="">xpt_done_td<br class="">fork_exit<br class="">fork_trampoline<br class="">```<br class=""><br class="">and noticed that the last param `cts` of `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings *cts)`<br class="">is from kernel stack and is not properly initialized, latter I found some commits related to this,<br class=""><br class="">076686fe0703 cam: make sure to clear CCBs allocated on the stack<br class="">ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack<br class="">0f206cc91279 cam: add missing zeroing of a stack-allocated CCB.<br class="">616a676a0535 cam: clear stack-allocated CCB in the target layer<br class=""><br class="">I applied them to stable/13, rebuild and reboot, now the speed of da0 is reported correctly. I also tried to patch the pvscsi driver with few lines and<br class="">it also works as intended.<br class=""><br class="">```<br class="">--- a/sys/dev/vmware/pvscsi/pvscsi.c<br class="">+++ b/sys/dev/vmware/pvscsi/pvscsi.c<br class="">@@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb)<br class=""> <span class="Apple-converted-space"> </span>cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB;<br class=""> <span class="Apple-converted-space"> </span>cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ;<br class=""><br class="">+ /* Prefer connection speed over sas port speed */<br class="">+ /* cts->xport_specific.sas.bitrate = 0; */<br class="">+ cts->xport_specific.sas.valid = 0;<br class="">+<br class=""> <span class="Apple-converted-space"> </span>ccb_h->status = CAM_REQ_CMP;<br class=""> <span class="Apple-converted-space"> </span>xpt_done(ccb);<br class="">```<br class=""><br class="">Things come clear and I know why this weird speed happens, now it is time to decide how to fix it.<br class=""><br class="">Fixing the consumer of cam, aka pvscsi driver, is quite simple and promising. I did a quick search it appears other consumers set `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm quite new to cam subsystem.<br class=""></blockquote><div class=""><br class=""></div><div class="">Yes. sas.valid is set when the sas.bitrate and other data has been set correctly. Setting it to 0 like this ensures it's ignored. So if you know the speed, set sas.bitrate to that speed and sas.valid to 1.</div></div></div></blockquote><div><br class=""></div><div>That is clear.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote gmail_quote_container" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div class=""><br class=""></div><div class="">I'm not sure I answered the question right, but if not, please clarify or point out what I missed and I'll try again.</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;">Which one do you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver to stable/13 ?<br class=""></blockquote><div class=""><br class=""></div><div class="">[[ Sorry for the delay, I missed this all month ]]</div><div class=""><br class=""></div><div class="">I generally prefer a MFC, unless the code is no longer in -current. </div></div></div></blockquote><div><br class=""></div><div>The code live in -current and all supported stable branches.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote gmail_quote_container" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div class="">Even if there's two different fixes for this logical bug, fixing it in current, then MFCing that (with the current hash) is fine, even if the stable/13 changes are completely different. </div></div></div></blockquote><div><br class=""></div><div>The bug does not exist in current and stable/14 but only in stable/13, due to Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed into stable/13 branch.</div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div class="gmail_quote gmail_quote_container" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div class="">For stable/13 I guess it matters a bit less than stable/14 since I'll be merging to it less, but if it's a commit from -current that doesn't need to be made to -stable because of the new commit on stable, I tend to include the MFC hash text.</div></div></div></blockquote><div><br class=""></div><div>Do you mean the `cherry picked from commit` commit message ?</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote gmail_quote_container" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div class=""><br class=""></div><div class="">Warner</div></div></div></blockquote></div><div class=""><br class=""></div><div class="">I'm preparing and testing the MFCing. Bless me to not make things messed up :)</div><br class=""><div class=""> <div>Best regards,</div><div>Zhenlei</div> </div> <br class=""></body></html>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CB8230B-7938-4503-AADD-7F691482908C>
