Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Jan 2022 17:58:13 -0500
From:      Mark Johnston <markj@freebsd.org>
To:        Peter <pmc@citylink.dinoex.sub.org>
Cc:        freebsd-stable@freebsd.org, jtl@freebsd.org
Subject:   Re: dtrace bitfields failure (was: 12.3-RC1 fails ...)
Message-ID:  <YdTRBU2EoMORjPOw@nuc>
In-Reply-To: <YdTC9WzqT2p2NgxX@gate.intra.daemon.contact>
References:  <YdRiUXviQK8hCBhC@gate.intra.daemon.contact> <YdSLk72bft8ed%2BdH@nuc> <YdTC9WzqT2p2NgxX@gate.intra.daemon.contact>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 04, 2022 at 10:58:13PM +0100, Peter wrote:
> On Tue, Jan 04, 2022 at 01:01:55PM -0500, Mark Johnston wrote:
> ! On Tue, Jan 04, 2022 at 04:05:53PM +0100, Peter wrote:
> ! > 
> ! > Hija,
> ! > 
> ! >   sadly, I was too early in agreeing that the two patches
> ! >      22082f15f9
> ! >      68396709e7
> ! > together do solve the issue. They only do on a certain assumption,
> ! > which does not hold true in all cases.
> ! > 
> ! > 
> ! > Let's look at https://reviews.freebsd.org/D27213
> ! > 
> ! > This is the code in question that will trigger the action:
> ! > 
> ! >      if (dst_type == CTF_ERR && name[0] != '\0' &&
> ! >              (hep = ctf_hash_lookup(&src_fp->ctf_names, src_fp, name,
> ! >              strlen(name))) != NULL &&
> ! >              src_type != (ctf_id_t)hep->h_type) {
> ! > 
> ! > What happens here: in the case of a bitfield type we need to also
> ! > copy the corresponding intrinsic type. This condition here checks for
> ! > the case and also should deliver that respective intrinsic type
> ! > into the "hep" variable.
> ! > 
> ! > But this depends on the assumption that the intrinsic type appears
> ! > first in the "src_fp" container, so that the hash will point to it.
> ! > And that is not necessarily true; it depends on what options you have
> ! > in your kernel config.
> ! > 
> ! > 
> ! > For instance, with my custom kernel, things look like this:
> ! > 
> ! > $ ctfdump -t kernel.full
> ! > 
> ! > - Types ----------------------------------------------------------------------
> ! > 
> ! >   [1] STRUCT (anon) (8 bytes)
> ! >         sle_next type=262 off=0
> ! > 
> ! >   [2] STRUCT (anon) (8 bytes)
> ! >         stqe_next type=262 off=0
> ! > 
> ! >   [3] UNION (anon) (8 bytes)
> ! >         m_next type=262 off=0
> ! >         m_slist type=1 off=0
> ! >         m_stailq type=2 off=0
> ! > 
> ! >   [4] UNION (anon) (8 bytes)
> ! >         m_nextpkt type=262 off=0
> ! >         m_slistpkt type=1 off=0
> ! >         m_stailqpkt type=2 off=0
> ! > 
> ! >   <5> INTEGER char encoding=SIGNED CHAR offset=0 bits=8
> ! >   <6> POINTER (anon) refers to 5
> ! >   <7> TYPEDEF caddr_t refers to 6
> ! >   <8> INTEGER int encoding=SIGNED offset=0 bits=32
> ! >   <9> TYPEDEF __int32_t refers to 8
> ! >   <10> TYPEDEF int32_t refers to 9
> ! >   [11] INTEGER unsigned int encoding=0x0 offset=0 bits=8
> ! >   [12] INTEGER unsigned int encoding=0x0 offset=0 bits=24
> ! >   [13] STRUCT (anon) (8 bytes)
> ! >         cstqe_next type=229 off=0
> ! > 
> ! >   <14> POINTER (anon) refers to 229
> ! >   [15] STRUCT (anon) (16 bytes)
> ! >         le_next type=229 off=0
> ! >         le_prev type=14 off=64
> ! > 
> ! >   <16> INTEGER long encoding=SIGNED offset=0 bits=64
> ! >   <17> ARRAY (anon) content: 5 index: 16 nelems: 16
> ! > 
> ! >   <18> INTEGER unsigned int encoding=0x0 offset=0 bits=32
> ! >   <19> TYPEDEF u_int refers to 18
> ! > [etc.etc.]
> ! > 
> ! > 
> ! > As we can see, this one has the bitfield types as #11 and #12, and
> ! > the intrinsic type as #18. And consequentially things do fail.
> ! > 
> ! > 
> ! > I currently do not know what is the culprit. Has the linking stage of
> ! > the kernel a flaw? Or is the patch D27213 based on a wrong assumption?
> ! > 
> ! > I hope You guys can answer that. For now I changed the patch D27213
> ! > to cover the case, so that things do work.
> ! > Further details on request.
> ! 
> ! I'm not immediately sure where the problem is.  Could you please post
> ! the kernel configuration and src revision that you're using, so that I
> ! can try and reproduce this?
> 
> Oh, I feared that would come...
> Src revision is easy now: release/12.3.0 (70cb68e7a00)
> 
> Kernel config is difficult. I have compiled into the kernel
>  * ipfw (obviousely)
>  * dtraceall 
>  * drm2 & friends (that needs objects to be added to conf/files)
>  * khelp/h_ertt/etc. (that needs the files and fixing the SI_SUB
>    sequence to make it boot)
> So the kernel config itself doesn't help to reproduce. 

Can you show output of "ctfdump -S /path/to/your/kernel"?  Though you're
on a fairly old revision, with this set of extra modules linked into the
kernel you might be overflowing CTFv2's limit of 2^15 distinct type
definitions.  That is not the proximate cause of the problem, which as
you identified is that a bitfield type is appearing before the
corresponding intrinsic, but it might be the root cause if a type ID
overflow is causing ctfmerge to emit types in the wrong order.

> What I am currently looking for is only an educated statement, about
> if that types sequence (as quoted above) can possibly happen, or, should
> never happen at all.
> If it should not happen, then it's my fault and I might go and look why
> it happens.
> 
> ! How exactly does the bug manifest?
> 
> Exactly as is to be expected, with either of these two errors
> (depending on the native order of files in /usr/lib/dtrace);
> 
> [1]  dtrace: failed to establish error handler: "/usr/lib/dtrace/ipfw.d",
>   line 107: failed to copy type of 'inp': Conflicting type is already
>   defined
> [2]  dtrace: failed to establish error handler:
>   "/usr/lib/dtrace/psinfo.d", line 41: failed to copy type of
>   'pr_gid': Conflicting type is already defined 
> 
> 
> Then I single-stepped the libctf and it clearly showed the mismatch
> between type #11 and type #18 (and the patch 68396709e7 one time doing
> things where it shouldn't and the other time not doing things where
> it should).
> 
> So I am probably on track with understanding what happens, nevertheless
> I would greatly appreciate some input from You how it *is supposed to*
> work.

Reading libctf's init_types(), it seems pretty clear that the comment
added in D27213 is true: the second pass over the type graph inserts
definitions into various hash tables, and for integer types only the
first instance of a type with a particular name is inserted.  In your
case this means that a lookup by name of "unsigned int" will return a
bitfield type.

I can't immediately see how exactly ctfconvert/ctfmerge ensure that
bitfields are ordered after intrinsic types, if they really do at all.
It would be interesting to try running ctfconvert on each object file
for your kernel to see if the ordering you showed above exists in a
specific object file.  If not, then I think we should check for a type
ID overflow, then look more closely at ctfmerge.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YdTRBU2EoMORjPOw>