From nobody Mon Jan 20 12:48:47 2025 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Yc9Bb1zyqz5lJbh for ; Mon, 20 Jan 2025 12:48:55 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Yc9Bb1kqLz3wk9; Mon, 20 Jan 2025 12:48:55 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1737377335; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xmlSJqVbfnkvyjSKJ16x0fEoOn4JBc093UzE00q70+k=; b=Kr2ljoxGy8Oh9NftflI2g97QqCeW+g2+BrDcWw2NcxAmGXi4cdrqTCnv85TKn+M984bfyV kst7helufUMYzmHi4CiA5EV5nhrC/1FtLjMWc04URzkHl7fMuBDqq3rSl4f+aiXOBSNIw/ 0xj5TUQWMV98t2FYMzKu9yd1EuAUBall7BUUVrm2nBZOHuNG7HOHZGEK65ctA7+7RBV13m c+hJKCLyZ9Q+FpPuhMnjTkUrvJwMhJxRFzN9GajqndEdq7V2HqipcZbCv0oxxMOi78yKjS mTrC0fRQlKGY++CBKjnPOqF32rdXyn6/ammeZsUVmJEcnxOz3QXRMf+nkZiXWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1737377335; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xmlSJqVbfnkvyjSKJ16x0fEoOn4JBc093UzE00q70+k=; b=PP6Dnkh4O4dJgphlre2TtWxIglzTe4T+pjTS43mo3ZJK5WPj9ojKhFQyvWYv05RYKa7Vw1 PD9YFftmaFGpzP+jfM5l+gD+8iKKJdptz7Rv+dpcKKykN3Q0dRZDEpwtiSt59arDrSAw/2 FB1eKlMIwY3Cb/jFNKXUJbMapfiJVDUuY2B//0yUiZKktapjvJwFhpqeVqa+ImE3ewweFL CbhmFt7Z1YHuAH0EOxxLZgvro2KUttq9e7OB++M0b3HboBUHPuc9yMi5xWmtX45jtv9uYh 3utJvo89e6XYXe8yeXraDFZCoTU0kGCf1EHoqx16Hpq0gUocO9yx8XwwT0FL6g== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1737377335; a=rsa-sha256; cv=none; b=VlULEATVblkotLwCcVCISXBo7Ln00nktOgno+IWUNKdc1fYaEiFkm0q26BIyZZXCFTmZq0 7JTlD+gZbLnS5CZSe9Y65NzV63VWh5JcAYrsrM+5SFgKGO2co7cul2ZM03Jh+R6Vx1nzEz /BsSamo9f+GsXz+2Dy7kD0fDhLgJH2iEpVcO/NhlinLPVtIsegyemtzJ79XHf0Cw/4Ad8/ iT4ANQ1t53a+Ez+SKalNvhwf4+N27rQbIus+XFrK4H0GaxfeFQXAyA3pKACGg5WuSbG5tk i2CdgkMGGIYKvgNIMg7JC4mEdHV5+2LQTs6ZJmGr7ldRRfi+qe48SkBx3cpSLA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from smtpclient.apple (unknown [IPv6:2001:19f0:6001:9db:98f0:9fe0:3545:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4Yc9BY4Wn8zwlJ; Mon, 20 Jan 2025 12:48:53 +0000 (UTC) (envelope-from zlei@FreeBSD.org) From: Zhenlei Huang Message-Id: <56C24758-E34A-4A68-9E1D-8F7D27ADD0DF@FreeBSD.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_C93059E6-FA1F-4D0E-866B-7EA4A7E58A8D" List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.10\)) Subject: Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver Date: Mon, 20 Jan 2025 20:48:47 +0800 In-Reply-To: Cc: Warner Losh , FreeBSD Stable ML , Edward Tomasz Napierala To: Warner Losh References: <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> X-Mailer: Apple Mail (2.3696.120.41.1.10) --Apple-Mail=_C93059E6-FA1F-4D0E-866B-7EA4A7E58A8D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jan 15, 2025, at 12:37 PM, Warner Losh wrote: >=20 > Great. This looks good. MFSing is done, along with the following commits those clear the stack = allocated CCBs. 076686fe0703 cam: make sure to clear CCBs allocated on the stack ec5325dbca62 cam: make sure to clear even more CCBs allocated on the = stack 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. 616a676a0535 cam: clear stack-allocated CCB in the target layer Everything looks good so far. Best regards, Zhenlei >=20 > Warner >=20 > On Tue, Jan 14, 2025, 9:36=E2=80=AFPM Zhenlei Huang > wrote: >=20 >=20 >> On Jan 13, 2025, at 5:06 PM, Zhenlei Huang > wrote: >>=20 >>=20 >>=20 >>> On Jan 13, 2025, at 4:22 PM, Zhenlei Huang > wrote: >>>=20 >>>=20 >>>=20 >>>> On Dec 28, 2024, at 6:03 AM, Warner Losh > wrote: >>>>=20 >>>>=20 >>>>=20 >>>> On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang = > wrote: >>>> Hi Warner, >>>>=20 >>>> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed = weird report for sas speed. >>>> The boot console has the following, >>>>=20 >>>> ``` >>>> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0 >>>> da0: Fixed Direct Access SPC-4 SCSI = device >>>> da0: 4294967.295MB/s transfers >>>> ``` >>>> But camcontrol report the correct value, >>>> ``` >>>> # camcontrol inquiry da0 -R >>>> pass1: 750.000MB/s transfers, Command Queueing Enabled >>>> ``` >>>>=20 >>>> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see = any logic set those values. >>>>=20 >>>> Finally I managed to get the stack trace, >>>> ``` >>>> _scsi_announce_periph >>>> scsi_announce_periph_sbuf >>>> xpt_announce_periph_sbuf >>>> dadone_proberc >>>> xpt_done_process >>>> xpt_done_td >>>> fork_exit >>>> fork_trampoline >>>> ``` >>>>=20 >>>> and noticed that the last param `cts` of = `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int = *freq, struct ccb_trans_settings *cts)` >>>> is from kernel stack and is not properly initialized, latter I = found some commits related to this, >>>>=20 >>>> 076686fe0703 cam: make sure to clear CCBs allocated on the stack >>>> ec5325dbca62 cam: make sure to clear even more CCBs allocated on = the stack >>>> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. >>>> 616a676a0535 cam: clear stack-allocated CCB in the target layer >>>>=20 >>>> I applied them to stable/13, rebuild and reboot, now the speed of = da0 is reported correctly. I also tried to patch the pvscsi driver with = few lines and >>>> it also works as intended. >>>>=20 >>>> ``` >>>> --- a/sys/dev/vmware/pvscsi/pvscsi.c >>>> +++ b/sys/dev/vmware/pvscsi/pvscsi.c >>>> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb = *ccb) >>>> cts->proto_specific.scsi.flags =3D = CTS_SCSI_FLAGS_TAG_ENB; >>>> cts->proto_specific.scsi.valid =3D = CTS_SCSI_VALID_TQ; >>>>=20 >>>> + /* Prefer connection speed over sas port speed */ >>>> + /* cts->xport_specific.sas.bitrate =3D 0; */ >>>> + cts->xport_specific.sas.valid =3D 0; >>>> + >>>> ccb_h->status =3D CAM_REQ_CMP; >>>> xpt_done(ccb); >>>> ``` >>>>=20 >>>> Things come clear and I know why this weird speed happens, now it = is time to decide how to fix it. >>>>=20 >>>> Fixing the consumer of cam, aka pvscsi driver, is quite simple and = promising. I did a quick search it appears other consumers set = `cts->xport_specific.sas.valid` correctly. It does not convince me as = I'm quite new to cam subsystem. >>>>=20 >>>> Yes. sas.valid is set when the sas.bitrate and other data has been = set correctly. Setting it to 0 like this ensures it's ignored. So if = you know the speed, set sas.bitrate to that speed and sas.valid to 1. >>>=20 >>> That is clear. >>>=20 >>>>=20 >>>> I'm not sure I answered the question right, but if not, please = clarify or point out what I missed and I'll try again. >>>> =20 >>>> Which one do you prefer, MFC commits to stable/13, or do direct fix = for pvscsi driver to stable/13 ? >>>>=20 >>>> [[ Sorry for the delay, I missed this all month ]] >>>>=20 >>>> I generally prefer a MFC, unless the code is no longer in -current.=20= >>>=20 >>> The code live in -current and all supported stable branches. >>>=20 >>>> Even if there's two different fixes for this logical bug, fixing it = in current, then MFCing that (with the current hash) is fine, even if = the stable/13 changes are completely different.=20 >>>=20 >>> The bug does not exist in current and stable/14 but only in = stable/13, due to Edward's work ( commits those zero stack-allocated = CCBs ) were not MFCed into stable/13 branch. >>>=20 >>>> For stable/13 I guess it matters a bit less than stable/14 since = I'll be merging to it less, but if it's a commit from -current that = doesn't need to be made to -stable because of the new commit on stable, = I tend to include the MFC hash text. >>>=20 >>> Do you mean the `cherry picked from commit` commit message ? >>>=20 >>>>=20 >>>> Warner >>>=20 >>>=20 >>> I'm preparing and testing the MFCing. Bless me to not make things = messed up :) >>=20 >> And the individual fix for pvscsi is posted to = https://reviews.freebsd.org/D48438 = . >=20 > Landed in -current. Will be MFCed to stable/13 after a few days. >=20 >>=20 >>>=20 >>> Best regards, >>> Zhenlei >=20 >=20 >=20 --Apple-Mail=_C93059E6-FA1F-4D0E-866B-7EA4A7E58A8D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Jan 15, 2025, at 12:37 PM, Warner Losh <imp@bsdimp.com> = wrote:

Great. This looks = good.

MFSing is = done, along with the following commits those clear the stack allocated = CCBs.

076686fe0703 cam: make = sure to clear CCBs allocated on the stack
ec5325dbca62 cam: = make sure to clear even more CCBs allocated on the = stack
0f206cc91279 cam: add missing zeroing of a = stack-allocated CCB.
616a676a0535 cam: clear stack-allocated = CCB in the target layer

Everything = looks good so far.

Best = regards,
Zhenlei


Warner

On Tue, Jan = 14, 2025, 9:36=E2=80=AFPM Zhenlei Huang <zlei@freebsd.org> = wrote:

On Jan 13, 2025, at 5:06 PM, Zhenlei Huang = <zlei@FreeBSD.org> wrote:



On Jan 13, 2025, at 4:22 PM, = Zhenlei Huang <zlei@FreeBSD.org> wrote:



On Dec 28, 2024, at 6:03 AM, = Warner Losh <imp@bsdimp.com> wrote:



On Mon, Dec 2, = 2024 at 2:41=E2=80=AFAM Zhenlei Huang <zlei@freebsd.org> wrote:
Hi Warner,

Recently I upgraded some ESXi vms from 13.3 to 13.4 and = noticed weird report for sas speed.
The boot console has = the following,

```
da0 at = pvscsi0 bus 0 scbus2 target 0 lun 0
da0: <VMware = Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
da0: 4294967.295MB/s transfers
```
But camcontrol report the correct value,
```
# camcontrol inquiry da0 -R
pass1: 750.000MB/s = transfers, Command Queueing Enabled
```

The `4294967.295MB` is actually 0xffff_ffff or -1 but I do = not see any logic set those values.

Finally = I managed to get the stack trace,
```
_scsi_announce_periph
scsi_announce_periph_sbuf
xpt_announce_periph_sbuf
dadone_proberc
xpt_done_process
xpt_done_td
fork_exit
fork_trampoline
```

and noticed that the last param `cts` of = `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int = *freq, struct ccb_trans_settings *cts)`
is from kernel = stack and is not properly initialized, latter I found some commits = related to this,

076686fe0703 cam: make = sure to clear CCBs allocated on the stack
ec5325dbca62 = cam: make sure to clear even more CCBs allocated on the stack
0f206cc91279 cam: add missing zeroing of a stack-allocated = CCB.
616a676a0535 cam: clear stack-allocated CCB in the = target layer

I applied them to stable/13, = rebuild and reboot, now the speed of da0 is reported correctly. I also = tried to patch the pvscsi driver with few lines and
it = also works as intended.

```
---= a/sys/dev/vmware/pvscsi/pvscsi.c
+++ = b/sys/dev/vmware/pvscsi/pvscsi.c
@@ -1444,6 +1444,10 @@ = pvscsi_action(struct cam_sim *sim, union ccb *ccb)
  =               cts->proto_specific.scsi.flags =3D = CTS_SCSI_FLAGS_TAG_ENB;
          =       cts->proto_specific.scsi.valid =3D = CTS_SCSI_VALID_TQ;

+      =          /* Prefer connection speed over sas = port speed */
+            =    /* cts->xport_specific.sas.bitrate =3D 0; */
+              =  cts->xport_specific.sas.valid =3D 0;
+
                ccb_h->status =3D CAM_REQ_CMP;
                xpt_done(ccb);
```
Things come clear and I know why this weird speed happens, = now it is time to decide how to fix it.

Fixing the consumer of cam, aka pvscsi driver, is quite = simple and promising. I did a quick search it appears other consumers = set `cts->xport_specific.sas.valid` correctly. It does not convince = me as I'm quite new to cam subsystem.

Yes. sas.valid is set = when the sas.bitrate and other data has been set correctly. Setting it = to 0 like this ensures it's ignored.  So if you know the speed, set = sas.bitrate to that speed and sas.valid to = 1.

That is clear.


I'm = not sure I answered the question right, but if not, please clarify or = point out what I missed and I'll try again.
 
Which one do you prefer, MFC commits to = stable/13, or do direct fix for pvscsi driver to stable/13 ?

[[ Sorry for the delay, I missed this all month ]]

I generally prefer a = MFC, unless the code is no longer in -current. 
The code live in -current and all = supported stable branches.

Even if there's two different fixes for = this logical bug, fixing it in current, then MFCing that (with the = current hash) is fine, even if the stable/13 changes are completely = different. 
The bug does not exist in current and = stable/14 but only in stable/13, due to Edward's work ( commits those = zero stack-allocated CCBs ) were not MFCed into stable/13 = branch.

For stable/13 I guess it matters a bit = less than stable/14 since I'll be merging to it less, but if it's a = commit from -current that doesn't need to be made to -stable because of = the new commit on stable, I tend to include the MFC hash = text.

Do you mean the `cherry picked from = commit` commit message ?


Warner

I'm preparing and testing the MFCing. Bless = me to not make things messed up :)

And the individual fix for pvscsi is posted = to https://reviews.freebsd.org/D48438 .

Landed in = -current. Will be MFCed to stable/13 after a few days.



Best regards,
Zhenlei
=





= --Apple-Mail=_C93059E6-FA1F-4D0E-866B-7EA4A7E58A8D--