From nobody Tue Jan 4 22:58:13 2022 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9B9DA192CFA1 for ; Tue, 4 Jan 2022 22:58:17 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JT7Lx3myvz3BvS; Tue, 4 Jan 2022 22:58:17 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-io1-xd2a.google.com with SMTP id x6so45879980iol.13; Tue, 04 Jan 2022 14:58:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eqQOK2lKrTTp+rePJAmRZjWrWHZyJuF0oBjR3dOu4IM=; b=Qx9OXGzYdi6JuPSH3nyFsmj/LxocZ2J0+lrLulJKrzHi62Hhg4FOc+yWiuPzFMj35W 1FjaCn1nLIuyQSKo7iaa0ZXEHsCi3zxbf6LVsUFhoEA67p93Y4tRWXXI8PSDnOKOyL3v Ctt558UAk6zGGkhz5jgpMJ7xfJboDQt29kT+rIvm7A6FS2JV0yUVkDu9lxBs9r+z7lM/ 4RiAxwiLCGaYZ79OAA23NuGw+/HIN3YDy/iCpCrwAh9q5XqO/jvlxylXlxTC7WQbXkdM 0A0JsySOPgWuMiah4pzNrtnkoAiaxE3aIt3KJSiuz0MFdBVoVln6l+T/ZzJ9nMvUbY/Z 2S4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=eqQOK2lKrTTp+rePJAmRZjWrWHZyJuF0oBjR3dOu4IM=; b=UAoTaoEmfQUIUaw1avlykiThJZwbfqaIv+ao4I9TZUTx8G6LE+8XXcxLFOnBvDJsFA x8xSriqpdMjJRMaLqDPDGoQAGoNGpgaLaHGQ9rK3RisE73nH8Z+QOq8wu2R3o9ZpnTr4 1fc9+/2Kb2xlYy4KCnBkhG3r6THZl/2/Rsog3mYKntwNvWnp+qAmgbn/0IK9b0UuXhSA g31X+2Pl7jysH77zKRMjGTBBPO8GX3sPHqNBGy01ggw2HjrkC27qk72poT1YCUTlfE3a p5kUhU1ZzVQ1AbumuSSvJUs0udnB207oFkUI6gB7UHP1H51GB1VZhzn5IgXdBWbPwACN O/BQ== X-Gm-Message-State: AOAM531bSQ/PCU6BknGeHeNrKGyM9DEqmHHCI61vMv9mlt1dKckOGq5p GmJHilvb67M62BgpMApDW5+NxrZcAlw= X-Google-Smtp-Source: ABdhPJwId48eSrowjtDWyS20vO5IC+YbVhAjYj+HrOZGaVlSRGmpwxaB+URAVemxIrcXcW9wZiMG/w== X-Received: by 2002:a05:6638:3791:: with SMTP id w17mr19033790jal.310.1641337096748; Tue, 04 Jan 2022 14:58:16 -0800 (PST) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id y1sm26043904ilv.10.2022.01.04.14.58.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jan 2022 14:58:16 -0800 (PST) Date: Tue, 4 Jan 2022 17:58:13 -0500 From: Mark Johnston To: Peter Cc: freebsd-stable@freebsd.org, jtl@freebsd.org Subject: Re: dtrace bitfields failure (was: 12.3-RC1 fails ...) Message-ID: References: List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4JT7Lx3myvz3BvS X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, Jan 04, 2022 at 10:58:13PM +0100, Peter wrote: > On Tue, Jan 04, 2022 at 01:01:55PM -0500, Mark Johnston wrote: > ! On Tue, Jan 04, 2022 at 04:05:53PM +0100, Peter wrote: > ! > > ! > Hija, > ! > > ! > sadly, I was too early in agreeing that the two patches > ! > 22082f15f9 > ! > 68396709e7 > ! > together do solve the issue. They only do on a certain assumption, > ! > which does not hold true in all cases. > ! > > ! > > ! > Let's look at https://reviews.freebsd.org/D27213 > ! > > ! > This is the code in question that will trigger the action: > ! > > ! > if (dst_type == CTF_ERR && name[0] != '\0' && > ! > (hep = ctf_hash_lookup(&src_fp->ctf_names, src_fp, name, > ! > strlen(name))) != NULL && > ! > src_type != (ctf_id_t)hep->h_type) { > ! > > ! > What happens here: in the case of a bitfield type we need to also > ! > copy the corresponding intrinsic type. This condition here checks for > ! > the case and also should deliver that respective intrinsic type > ! > into the "hep" variable. > ! > > ! > But this depends on the assumption that the intrinsic type appears > ! > first in the "src_fp" container, so that the hash will point to it. > ! > And that is not necessarily true; it depends on what options you have > ! > in your kernel config. > ! > > ! > > ! > For instance, with my custom kernel, things look like this: > ! > > ! > $ ctfdump -t kernel.full > ! > > ! > - Types ---------------------------------------------------------------------- > ! > > ! > [1] STRUCT (anon) (8 bytes) > ! > sle_next type=262 off=0 > ! > > ! > [2] STRUCT (anon) (8 bytes) > ! > stqe_next type=262 off=0 > ! > > ! > [3] UNION (anon) (8 bytes) > ! > m_next type=262 off=0 > ! > m_slist type=1 off=0 > ! > m_stailq type=2 off=0 > ! > > ! > [4] UNION (anon) (8 bytes) > ! > m_nextpkt type=262 off=0 > ! > m_slistpkt type=1 off=0 > ! > m_stailqpkt type=2 off=0 > ! > > ! > <5> INTEGER char encoding=SIGNED CHAR offset=0 bits=8 > ! > <6> POINTER (anon) refers to 5 > ! > <7> TYPEDEF caddr_t refers to 6 > ! > <8> INTEGER int encoding=SIGNED offset=0 bits=32 > ! > <9> TYPEDEF __int32_t refers to 8 > ! > <10> TYPEDEF int32_t refers to 9 > ! > [11] INTEGER unsigned int encoding=0x0 offset=0 bits=8 > ! > [12] INTEGER unsigned int encoding=0x0 offset=0 bits=24 > ! > [13] STRUCT (anon) (8 bytes) > ! > cstqe_next type=229 off=0 > ! > > ! > <14> POINTER (anon) refers to 229 > ! > [15] STRUCT (anon) (16 bytes) > ! > le_next type=229 off=0 > ! > le_prev type=14 off=64 > ! > > ! > <16> INTEGER long encoding=SIGNED offset=0 bits=64 > ! > <17> ARRAY (anon) content: 5 index: 16 nelems: 16 > ! > > ! > <18> INTEGER unsigned int encoding=0x0 offset=0 bits=32 > ! > <19> TYPEDEF u_int refers to 18 > ! > [etc.etc.] > ! > > ! > > ! > As we can see, this one has the bitfield types as #11 and #12, and > ! > the intrinsic type as #18. And consequentially things do fail. > ! > > ! > > ! > I currently do not know what is the culprit. Has the linking stage of > ! > the kernel a flaw? Or is the patch D27213 based on a wrong assumption? > ! > > ! > I hope You guys can answer that. For now I changed the patch D27213 > ! > to cover the case, so that things do work. > ! > Further details on request. > ! > ! I'm not immediately sure where the problem is. Could you please post > ! the kernel configuration and src revision that you're using, so that I > ! can try and reproduce this? > > Oh, I feared that would come... > Src revision is easy now: release/12.3.0 (70cb68e7a00) > > Kernel config is difficult. I have compiled into the kernel > * ipfw (obviousely) > * dtraceall > * drm2 & friends (that needs objects to be added to conf/files) > * khelp/h_ertt/etc. (that needs the files and fixing the SI_SUB > sequence to make it boot) > So the kernel config itself doesn't help to reproduce. Can you show output of "ctfdump -S /path/to/your/kernel"? Though you're on a fairly old revision, with this set of extra modules linked into the kernel you might be overflowing CTFv2's limit of 2^15 distinct type definitions. That is not the proximate cause of the problem, which as you identified is that a bitfield type is appearing before the corresponding intrinsic, but it might be the root cause if a type ID overflow is causing ctfmerge to emit types in the wrong order. > What I am currently looking for is only an educated statement, about > if that types sequence (as quoted above) can possibly happen, or, should > never happen at all. > If it should not happen, then it's my fault and I might go and look why > it happens. > > ! How exactly does the bug manifest? > > Exactly as is to be expected, with either of these two errors > (depending on the native order of files in /usr/lib/dtrace); > > [1] dtrace: failed to establish error handler: "/usr/lib/dtrace/ipfw.d", > line 107: failed to copy type of 'inp': Conflicting type is already > defined > [2] dtrace: failed to establish error handler: > "/usr/lib/dtrace/psinfo.d", line 41: failed to copy type of > 'pr_gid': Conflicting type is already defined > > > Then I single-stepped the libctf and it clearly showed the mismatch > between type #11 and type #18 (and the patch 68396709e7 one time doing > things where it shouldn't and the other time not doing things where > it should). > > So I am probably on track with understanding what happens, nevertheless > I would greatly appreciate some input from You how it *is supposed to* > work. Reading libctf's init_types(), it seems pretty clear that the comment added in D27213 is true: the second pass over the type graph inserts definitions into various hash tables, and for integer types only the first instance of a type with a particular name is inserted. In your case this means that a lookup by name of "unsigned int" will return a bitfield type. I can't immediately see how exactly ctfconvert/ctfmerge ensure that bitfields are ordered after intrinsic types, if they really do at all. It would be interesting to try running ctfconvert on each object file for your kernel to see if the ordering you showed above exists in a specific object file. If not, then I think we should check for a type ID overflow, then look more closely at ctfmerge.