From nobody Wed Jan 5 20:00:51 2022 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id CA3CA19354EE for ; Wed, 5 Jan 2022 20:09:19 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JTgYW36wYz4kBj; Wed, 5 Jan 2022 20:09:19 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.org [185.220.148.12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 205K94pE083165 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 5 Jan 2022 21:09:05 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: uucp.dinoex.sub.de: Host uucp.dinoex.org [185.220.148.12] claimed to be uucp.dinoex.sub.de Received: (from uucp@localhost) by uucp.dinoex.sub.de (8.17.1/8.17.1/Submit) with UUCP id 205K94xS083164; Wed, 5 Jan 2022 21:09:04 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from gate.intra.daemon.contact (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.16.1/8.16.1) with ESMTP id 205K3RHj082304; Wed, 5 Jan 2022 21:03:27 +0100 (CET) (envelope-from peter@gate.intra.daemon.contact) Received: from gate.intra.daemon.contact (gate-e [192.168.98.2]) by gate.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 205K0pK4081845 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 5 Jan 2022 21:00:51 +0100 (CET) (envelope-from peter@gate.intra.daemon.contact) Received: (from peter@localhost) by gate.intra.daemon.contact (8.16.1/8.16.1/Submit) id 205K0pR5081844; Wed, 5 Jan 2022 21:00:51 +0100 (CET) (envelope-from peter) Date: Wed, 5 Jan 2022 21:00:51 +0100 From: Peter To: Mark Johnston Cc: freebsd-stable@freebsd.org, jtl@freebsd.org Subject: Re: dtrace bitfields failure (was: 12.3-RC1 fails ...) Message-ID: References: List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 185.220.148.12; Sender-helo: uucp.dinoex.sub.de;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [185.220.148.12]); Wed, 05 Jan 2022 21:09:07 +0100 (CET) X-Rspamd-Queue-Id: 4JTgYW36wYz4kBj X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, Jan 04, 2022 at 05:58:13PM -0500, Mark Johnston wrote: ! On Tue, Jan 04, 2022 at 10:58:13PM +0100, Peter wrote: ! > On Tue, Jan 04, 2022 at 01:01:55PM -0500, Mark Johnston wrote: ! > ! On Tue, Jan 04, 2022 at 04:05:53PM +0100, Peter wrote: ! > ! > ! > ! > Hija, ! > ! > ! > ! > sadly, I was too early in agreeing that the two patches ! > ! > 22082f15f9 ! > ! > 68396709e7 ! > ! > together do solve the issue. They only do on a certain assumption, ! > ! > which does not hold true in all cases. ! > ! > ! > ! > ! > ! > Let's look at https://reviews.freebsd.org/D27213 ! > ! > ! > ! > This is the code in question that will trigger the action: ! > ! > ! > ! > if (dst_type == CTF_ERR && name[0] != '\0' && ! > ! > (hep = ctf_hash_lookup(&src_fp->ctf_names, src_fp, name, ! > ! > strlen(name))) != NULL && ! > ! > src_type != (ctf_id_t)hep->h_type) { ! > ! > ! > ! > What happens here: in the case of a bitfield type we need to also ! > ! > copy the corresponding intrinsic type. This condition here checks for ! > ! > the case and also should deliver that respective intrinsic type ! > ! > into the "hep" variable. ! > ! > ! > ! > But this depends on the assumption that the intrinsic type appears ! > ! > first in the "src_fp" container, so that the hash will point to it. ! > ! > And that is not necessarily true; it depends on what options you have ! > ! > in your kernel config. ! > ! > ! > ! > ! > ! > For instance, with my custom kernel, things look like this: ! > ! > ! > ! > $ ctfdump -t kernel.full ! > ! > ! > ! > - Types ---------------------------------------------------------------------- ! > ! > ! > ! > [1] STRUCT (anon) (8 bytes) ! > ! > sle_next type=262 off=0 ! > ! > ! > ! > [2] STRUCT (anon) (8 bytes) ! > ! > stqe_next type=262 off=0 ! > ! > ! > ! > [3] UNION (anon) (8 bytes) ! > ! > m_next type=262 off=0 ! > ! > m_slist type=1 off=0 ! > ! > m_stailq type=2 off=0 ! > ! > ! > ! > [4] UNION (anon) (8 bytes) ! > ! > m_nextpkt type=262 off=0 ! > ! > m_slistpkt type=1 off=0 ! > ! > m_stailqpkt type=2 off=0 ! > ! > ! > ! > <5> INTEGER char encoding=SIGNED CHAR offset=0 bits=8 ! > ! > <6> POINTER (anon) refers to 5 ! > ! > <7> TYPEDEF caddr_t refers to 6 ! > ! > <8> INTEGER int encoding=SIGNED offset=0 bits=32 ! > ! > <9> TYPEDEF __int32_t refers to 8 ! > ! > <10> TYPEDEF int32_t refers to 9 ! > ! > [11] INTEGER unsigned int encoding=0x0 offset=0 bits=8 ! > ! > [12] INTEGER unsigned int encoding=0x0 offset=0 bits=24 ! > ! > [13] STRUCT (anon) (8 bytes) ! > ! > cstqe_next type=229 off=0 ! > ! > ! > ! > <14> POINTER (anon) refers to 229 ! > ! > [15] STRUCT (anon) (16 bytes) ! > ! > le_next type=229 off=0 ! > ! > le_prev type=14 off=64 ! > ! > ! > ! > <16> INTEGER long encoding=SIGNED offset=0 bits=64 ! > ! > <17> ARRAY (anon) content: 5 index: 16 nelems: 16 ! > ! > ! > ! > <18> INTEGER unsigned int encoding=0x0 offset=0 bits=32 ! > ! > <19> TYPEDEF u_int refers to 18 ! > ! > [etc.etc.] ! > ! > ! > ! > ! > ! > As we can see, this one has the bitfield types as #11 and #12, and ! > ! > the intrinsic type as #18. And consequentially things do fail. ! > ! > ! > ! > ! > ! > I currently do not know what is the culprit. Has the linking stage of ! > ! > the kernel a flaw? Or is the patch D27213 based on a wrong assumption? ! > ! > ! > ! > I hope You guys can answer that. For now I changed the patch D27213 ! > ! > to cover the case, so that things do work. ! > ! > Further details on request. ! > ! ! > ! I'm not immediately sure where the problem is. Could you please post ! > ! the kernel configuration and src revision that you're using, so that I ! > ! can try and reproduce this? ! > ! > Oh, I feared that would come... ! > Src revision is easy now: release/12.3.0 (70cb68e7a00) ! > ! > Kernel config is difficult. I have compiled into the kernel ! > * ipfw (obviousely) ! > * dtraceall ! > * drm2 & friends (that needs objects to be added to conf/files) ! > * khelp/h_ertt/etc. (that needs the files and fixing the SI_SUB ! > sequence to make it boot) ! > So the kernel config itself doesn't help to reproduce. ! ! Can you show output of "ctfdump -S /path/to/your/kernel"? Though you're ! on a fairly old revision, with this set of extra modules linked into the ! kernel you might be overflowing CTFv2's limit of 2^15 distinct type ! definitions. That is not the proximate cause of the problem, which as ! you identified is that a bitfield type is appearing before the ! corresponding intrinsic, but it might be the root cause if a type ID ! overflow is causing ctfmerge to emit types in the wrong order. # ctfdump -S kernel.full - CTF Statistics ------------------------------------------------------------- total number of data objects = 27754 total number of functions = 27789 total number of function arguments = 66565 maximum argument list length = 22 average argument list length = 2.40 total number of types = 23838 total number of integers = 60 total number of floats = 1 total number of pointers = 6618 total number of arrays = 2079 total number of func types = 1910 total number of structs = 7474 total number of unions = 362 total number of enums = 622 total number of forward tags = 38 total number of typedefs = 3929 total number of volatile types = 44 total number of const types = 551 total number of restrict types = 0 total number of unknowns (holes) = 150 total number of struct members = 50431 maximum number of struct members = 248 total size of all structs = 6834343 maximum size of a struct = 1593440 average number of struct members = 6.75 average size of a struct = 914.42 total number of union members = 1234 maximum number of union members = 36 total size of all unions = 39472 maximum size of a union = 8208 average number of union members = 3.41 average size of a union = 109.04 total number of enum members = 6380 maximum number of enum members = 1023 average number of enum members = 10.26 total number of unique strings = 47418 bytes of string data = 630726 maximum string length = 69 average string length = 13.30 ! > What I am currently looking for is only an educated statement, about ! > if that types sequence (as quoted above) can possibly happen, or, should ! > never happen at all. ! > If it should not happen, then it's my fault and I might go and look why ! > it happens. ! > ! > ! How exactly does the bug manifest? ! > ! > Exactly as is to be expected, with either of these two errors ! > (depending on the native order of files in /usr/lib/dtrace); ! > ! > [1] dtrace: failed to establish error handler: "/usr/lib/dtrace/ipfw.d", ! > line 107: failed to copy type of 'inp': Conflicting type is already ! > defined ! > [2] dtrace: failed to establish error handler: ! > "/usr/lib/dtrace/psinfo.d", line 41: failed to copy type of ! > 'pr_gid': Conflicting type is already defined ! > ! > ! > Then I single-stepped the libctf and it clearly showed the mismatch ! > between type #11 and type #18 (and the patch 68396709e7 one time doing ! > things where it shouldn't and the other time not doing things where ! > it should). ! > ! > So I am probably on track with understanding what happens, nevertheless ! > I would greatly appreciate some input from You how it *is supposed to* ! > work. ! ! Reading libctf's init_types(), it seems pretty clear that the comment ! added in D27213 is true: the second pass over the type graph inserts ! definitions into various hash tables, and for integer types only the ! first instance of a type with a particular name is inserted. In your ! case this means that a lookup by name of "unsigned int" will return a ! bitfield type. That is my understanding so far, as well. So then, when I had figured that the hash lookup delivers the wrong type, I just removed the hash lookup, and instead search through the types sequentially. That seems to work. [3], see below ! I can't immediately see how exactly ctfconvert/ctfmerge ensure that ! bitfields are ordered after intrinsic types, if they really do at all. ! It would be interesting to try running ctfconvert on each object file ! for your kernel to see if the ordering you showed above exists in a ! specific object file. If not, then I think we should check for a type ! ID overflow, then look more closely at ctfmerge. Lets do it, that will be fun. Because, I tried already by removing files from the kernel config. The funny thing is, if I remove any one of (dtraceall, drm2, khelp/h_errt), the problem goes away. Also, if I remove any one of the h_ertt files (cc_vegas, cc_cdg), the problem goes away. And, if I *add* things to the kernel config, it also goes away. Conclusion: it seems impossible to pinpoint a single file as the cause. At that point I gave up and started to singlestep the libctf instead. So, if You're curious and want to figure it out, here is a git-am patch, that should apply right on top of release/12.3.0 (70cb68e7a00). It should create a sys/amd64/conf/D6R12V1 kernelconfig and all my sourcetree modifications, and should cleanly build the respective kernel. [3] is included, so the flaw is already covered. http://oper.dinoex.de/.well-known/acme-challenge/patch-for-mark-johnston.patch (That one should be accessible) concerning [3]: this is just an utterly non-optimized proof-of-concept patch. cheerio, PMc