Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 May 2023 00:11:40 +0200
From:      Dimitry Andric <dim@FreeBSD.org>
To:        "Jason A. Harmening" <jah@FreeBSD.org>
Cc:        John Baldwin <jhb@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-branches@freebsd.org
Subject:   Re: git: 060699e91369 - stable/13 - Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6
Message-ID:  <331B50E9-A7F9-4BCF-90BD-EA1E69A8F1ED@FreeBSD.org>
In-Reply-To: <ZFAUJNGjhfrDQ-Hf@corona>
References:  <202304092135.339LZMeJ081640@gitrepo.freebsd.org> <ZE1jEcA7iL3QrbOP@corona> <76DD2CB9-986B-4349-8F46-3B7BF63EB315@FreeBSD.org> <ZE1vtvghEV_RsD2b@corona> <ZE33_AWw0DO8-EjM@kib.kiev.ua> <ZE7-AcTXtCkSzsHb@corona> <ZE8JJQizER2WrTr2@corona> <ZE8yqfOkrrMddua1@corona> <df22667e-831f-deaa-3cb0-7b9797caa833@FreeBSD.org> <E0E68380-908D-4939-B009-844BA005DB64@FreeBSD.org> <ZFAUJNGjhfrDQ-Hf@corona>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_8AAB2DF7-0578-4C2A-A09B-84677789D39C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 1 May 2023, at 21:33, Jason A. Harmening <jah@FreeBSD.org> wrote:
>=20
> On Mon, May 01, 2023 at 08:41:32PM +0200, Dimitry Andric wrote:
>> On 1 May 2023, at 18:14, John Baldwin <jhb@freebsd.org> wrote:
>>>=20
>>> On 4/30/23 8:31 PM, Jason A. Harmening wrote:
>>>> On Sun, Apr 30, 2023 at 07:34:45PM -0500, Jason A. Harmening wrote:
>>>>> On Sun, Apr 30, 2023 at 06:47:13PM -0500, Jason A. Harmening =
wrote:
>>>>>> On Sun, Apr 30, 2023 at 08:09:16AM +0300, Konstantin Belousov =
wrote:
>>>>>>> On Sat, Apr 29, 2023 at 02:27:50PM -0500, Jason A. Harmening =
wrote:
>>>>>>>> On Sat, Apr 29, 2023 at 08:49:28PM +0200, Dimitry Andric wrote:
>>>>>>>>> On 29 Apr 2023, at 20:33, Jason A. Harmening <jah@FreeBSD.org> =
wrote:
>>>>>>>>>>=20
>>>>>>>>>> On Sun, Apr 09, 2023 at 09:35:22PM +0000, Dimitry Andric =
wrote:
>>>>>>>>>>> The branch stable/13 has been updated by dim:
>>>>>>>>>>>=20
>>>>>>>>>>> URL: =
https://cgit.FreeBSD.org/src/commit/?id=3D060699e9136975d51d3f726b9785bdba=
c9a62ba6
>>>>>>>>>>>=20
>>>>>>>>>>> commit 060699e9136975d51d3f726b9785bdbac9a62ba6
>>>>>>>>>>> Author:     Dimitry Andric <dim@FreeBSD.org>
>>>>>>>>>>> AuthorDate: 2023-01-14 16:33:24 +0000
>>>>>>>>>>> Commit:     Dimitry Andric <dim@FreeBSD.org>
>>>>>>>>>>> CommitDate: 2023-04-09 14:54:52 +0000
>>>>>>>>>>>=20
>>>>>>>>>>>   Merge llvm-project release/15.x =
llvmorg-15.0.7-0-g8dfdcc7b7bf6
>>>>>>>>>>>=20
>>>>>>>>>>>   This updates llvm, clang, compiler-rt, libc++, libunwind, =
lld, lldb and
>>>>>>>>>>>   openmp to llvmorg-15.0.7-0-g8dfdcc7b7bf6.
>>>>>>>>>>>=20
>>>>>>>>>>>   PR:             265425
>>>>>>>>>>>   MFC after:      2 weeks
>>>>>>>>>>=20
>>>>>>>>>> This MFC of llvm15 appears to have completely broken the =
Intel IOMMU
>>>>>>>>>> driver on my stable/13 machine.  After this series of =
commits, any
>>>>>>>>>> downstream DMA seems to produce an IOMMU translation fault, =
which
>>>>>>>>>> renders the machine completely unusable: no nvme boot disk, =
no usb
>>>>>>>>>> keyboard, etc.
>>>>>>>>>>=20
>>>>>>>>>> The faults I see look something like this:
>>>>>>>>>>=20
>>>>>>>>>> DMAR4: ahci0: pci0:17:5 sid 8d fault acc 0 adt 0x0 reason 0x3 =
addr 26000
>>>>>>>>>>=20
>>>>>>>>>> It's a bit surprising to see a toolchain upgrade produce =
breakage like
>>>>>>>>>> this, but that's what git bisect clearly tells me.  I wonder =
if some of
>>>>>>>>>> the IOMMU control structures might be defined as C bitfields =
and the new
>>>>>>>>>> compiler is emitting them differently?  Also, was any =
breakage like this
>>>>>>>>>> observed when -current was upgraded to llvm15 several months =
ago?
>>>>>>>>>=20
>>>>>>>>> I haven't heard anything about such breakage, no.
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>> More generally, this is the second time in as many months =
I've had to
>>>>>>>>>> deal with IOMMU breakage on -stable.  I can't imagine I'm the =
only
>>>>>>>>>> person who sees value in running with DMA remapping enabled; =
do we need
>>>>>>>>>> a dedicated DMAR-enabled machine in the cluster to smoke-test =
changes
>>>>>>>>>> like this?  More generally, should we avoid MFCing high-risk =
changes
>>>>>>>>>> like this?
>>>>>>>>>=20
>>>>>>>>> Since there were very few bug reports, it was not deemed high =
risk.
>>>>>>>>>=20
>>>>>>>>> In any case, it would be good to get the bottom of what is =
causing the
>>>>>>>>> problem, so is there any way you can isolate which code seems =
to be
>>>>>>>>> going "bad"?
>>>>>>>>>=20
>>>>>>>>> For example, if this problem affects code in sys/dev/iommu, is =
there
>>>>>>>>> some way you can compile that part with -O1, or with an older =
version
>>>>>>>>> of clang (from ports), to see if the problem goes away?
>>>>>>>>=20
>>>>>>>> I did try removing all custom make.conf settings (previously I =
just had
>>>>>>>> CPUTYPE?=3Dicelake-server), but that didn't change the =
behavior.
>>>>>>>>=20
>>>>>>>> Before I try further build tweaks, I'd like to ask if the IOMMU =
fault
>>>>>>>> report can provide guidance here?  AFAICT all the faults I'm =
getting
>>>>>>>> show "reason 0x3".  If I'm reading the VT-d spec correctly, =
FR=3D0x3
>>>>>>>> indicates an invalid context entry, in other words there's =
something the
>>>>>>>> hardware doesn't like in the way the address width or pagetable =
base is
>>>>>>>> configured for the PCIe requestor.
>>>>>>>=20
>>>>>>> I would start looking at the other direction: might be, there =
are still some
>>>>>>> left shifts for int32 values with the shift count > 30, or =
uint32 with the
>>>>>>> count > 31.
>>>>>>>=20
>>>>>>> Also might be useful to dump each context entry on creation, it =
is kept
>>>>>>> constant after.
>>>>>>=20
>>>>>> I did look over the constants in intel_reg.h, and didn't see =
anything
>>>>>> that looked as though it would be susceptible to sign-extension =
or
>>>>>> truncation bugs.  In the failing case it's much easier for me to =
catch
>>>>>> the fault messages than any initialization message, so I =
instrumented
>>>>>> the fault handler to get the context entry from the dmar_ctx =
object
>>>>>> using the same logic as dmar_map_ctx_entry(), and then dump out =
the ctx1
>>>>>> and ctx2 fields.  What I see are messages like:
>>>>>>=20
>>>>>> ... ctx1 0x10013b001 ctx2 0x103
>>>>>>=20
>>>>>> At first glance these "look right": the P bit is set in ctx1, and =
the
>>>>>> rest of the field looks like a valid physical address.  ctx2 also
>>>>>> doesn't have any of the reserved bits set, but in all cases it =
does have
>>>>>> AW=3D3, which would indicate 57-bit AGAW.  But when I boot the =
last
>>>>>> working kernel, from the revision prior to the llvm15 MFC, I see =
this in
>>>>>> dmesg:
>>>>>>=20
>>>>>> ahci0: dmar4 pci0:0:17:5 rid 8d domain 1 mgaw 48 agaw 48 =
re-mapped
>>>>>>=20
>>>>>> ...all reported devices show 48-bit MGAW/AGAW, so I would expect =
ctx2 to
>>>>>> have AW=3D2.  I suspect this may be the source of the fault, but =
I'm not
>>>>>> sure how it's getting configured that way, whether it's an issue =
with
>>>>>> reading the capability register or something else.
>>>>>>=20
>>>>>=20
>>>>> I can confirm that hacking domain_set_agaw() to always use the =
settings
>>>>> from sagaw_bits[2] eliminates the faults and at least allows the =
machine
>>>>> to boot to single-user mode.
>>>> I see what's happening now.  When I added the hack to always set
>>>> sagaw_bits[2], I noted that the passed-in MGAW was still 57, while
>>>> unit->hw_cap had the correct value of 0x4 (=3D> 4-level paging, =
48-bit AW)
>>>> in bits 12:8.  The problem is that sagaw_bits has agaw=3D64 in its =
last
>>>> entry.  This results in dmar_maxaddr2mgaw() attempting a comparison
>>>> against 1ULL << 64 in the final iteration of its first loop.  I =
suspect
>>>> the new compiler probably determines that last iteration is =
meaningless
>>>> and simply omits it from the (probably unrolled) loop.  Since the =
"loop"
>>>> terminates with i < nitems(sagaw_bits), the subsequent "allow_less =
..."
>>>> case doesn't execute and we end up erroneously selecting a 57-bit
>>>> address width.  Just commenting out that last entry in sagaw_bits =
fixes
>>>> the problem.
>>>> So, two questions:
>>>> 1) Does any VT-d hardware actually support 6-level paging?  The ca. =
2021
>>>> VT-d spec I'm looking at indicates 5-level is the greatest depth
>>>> supported, with everything above that being reserved.
>>>> 2) I'd expect clang to try very hard to error out in a situation =
like
>>>> this, but I see that sys/conf/kern.mk sets =
-Wno-shift-count-overflow
>>>> among other things, and more of them were added for clang 15.  This
>>>> seems like a really bad idea, regardless of how much of a PITA it =
may be
>>>> to fix these warnings.
>>>=20
>>> FWIW, I've been working on trying to re-enable some of the warnings =
that
>>> were disabled for clang 15 in main.  I'll move that one higher up on =
my
>>> todo list.
>>=20
>> In this particular case, it doesn't warn about it though. I think =
KASAN
>> might be a better 'catcher' for this kind of error, or a KUBSAN, if =
we
>> had one...
>=20
> If you've tried turning all the relevant warnings/errors back on for
> this code, and clang truly doesn't warn about it, then wouldn't this
> warrant a bug against upstream clang?  This is a situation in which =
the
> compiler detects a left-shift overflow, by itself at least a warning
> condition, and uses it to materially alter program flow.

You could try, but it will probably be classified as undefined behavior.
I guess the warning will only show up in a more full-blown analyzer. I
haven't tried that.

-Dimitry


--Apple-Mail=_8AAB2DF7-0578-4C2A-A09B-84677789D39C
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZFA5HAAKCRCwXqMKLiCW
o+hGAJ9WQT+6SV3pPcdi8c/OZWlFdkQDmQCgiybouLUiZxuGlvEul8mPjb6TzgI=
=DLM8
-----END PGP SIGNATURE-----

--Apple-Mail=_8AAB2DF7-0578-4C2A-A09B-84677789D39C--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?331B50E9-A7F9-4BCF-90BD-EA1E69A8F1ED>