From nobody Mon May 22 01:19:53 2023 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QPfks6w90z4Ck6Z for ; Mon, 22 May 2023 01:20:05 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QPfks50D9z3QBH for ; Mon, 22 May 2023 01:20:05 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-510d6b939bfso9375403a12.0 for ; Sun, 21 May 2023 18:20:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20221208.gappssmtp.com; s=20221208; t=1684718404; x=1687310404; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=G/WjTv8QsLDr/ouCXGqXrMS/kZGXt8b4ufmFCYgp78c=; b=LU2PWG4+GuDmJNKey2hgn9H/NYJiNs/hd/Q6XEUcoBGH+DurZBXyf0GEE57qckkfL5 y6BF4/LmqHeulG/lW+iyZOUHKi8n9mrwQUXGvnDXY4NSOh0Zv8+I/KwqSRP4gT6hezD9 h2aDVSh9z6OR5wqrc9bhz77V9vHmXst6fzBE47nBL3g9qCLl0md3jWoHqECnyik2NDAW XM77zRVMLdjrPTY05bZZ+9P0Ox6TXnkDwJHEINpuV9eai2VKQKZDFrlpPAcvokXKAGgp PgE9QOMYyzQzHxrn82qPsKjRCUIAIgQaYW4eTPdHO5RJTSxFvd2jf3SlDOljB8R+QDrD lrsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684718404; x=1687310404; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=G/WjTv8QsLDr/ouCXGqXrMS/kZGXt8b4ufmFCYgp78c=; b=keBcVlFQeWTpjsTA0+Df8pbRPhYGsHHzYP2t9mITAMXaJh+QQo64v6ERAmHfmZ5uSo vLlue73YpohxXVvCOCQcFNTn78IlIVPtZC862+YFUvrVkiEum0gEN5znsnQO7F0a/iQu 8Vf7wMTWkR8wy632wrzj/b1W57EWJ9g1vbBPcY0CGrAZhuq0vSAOHRuhafsJB698t5l1 /il6V8YFPZXSvQZBpMGxWXa89mgqqT07m1zfRn/teUkPMSYUjU7+82azGhh5B6rKrPAE Kyi8N+q4APOi8SrC2+Jg2ct7MlKc5zn4btSEclOataacYHIsR7Sy3MtvKqN83rDMk7/v 1G9g== X-Gm-Message-State: AC+VfDxTquurtR7KI0AjyJqPbDsC2RQcl3egs7pnLVQV4pobQyEOPI07 ZBaEq4cBJmM0wCjKIStqCSuwQEApr6JLLoWwms2rfzRfrY0DxXNh9u0= X-Google-Smtp-Source: ACHHUZ5YwpxDKeH1SKT+3OaDDYkUyM/U2GOCvV3xruMM8sJTMV1Wcw79G0+7cyO8rvKR7j3Hnwsx7h57Mblf1ayP1yc= X-Received: by 2002:a05:6402:2cb:b0:50d:fcfb:861b with SMTP id b11-20020a05640202cb00b0050dfcfb861bmr7905551edx.0.1684718404369; Sun, 21 May 2023 18:20:04 -0700 (PDT) List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Sun, 21 May 2023 21:19:53 -0400 Message-ID: Subject: Re: [RFC] An idea for general kernel post-processing automation in FreeBSD To: Konstantin Belousov Cc: Hans Petter Selasky , "freebsd-arch@freebsd.org" Content-Type: multipart/alternative; boundary="0000000000000a106905fc3e124e" X-Rspamd-Queue-Id: 4QPfks50D9z3QBH X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --0000000000000a106905fc3e124e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, May 21, 2023, 6:18 PM Konstantin Belousov wrote: > On Sun, May 21, 2023 at 05:34:26PM -0600, Warner Losh wrote: > > On Sun, May 21, 2023 at 2:13=E2=80=AFPM Hans Petter Selasky > wrote: > > > > > However, if the data in question is sorted at compile time, then zero > > > time will be spent sorting the data in the kernel. When a kernel modu= le > > > is loaded and the sysinit/constructors are already sorted by their > > > subsystem/order/priority, all you need to do is to merge two sorted > > > lists into another sorted list, and this is pretty much linear. > > > > > > The question is, who can sort the sysinits: > > > > > > > The bigger question is "Do we even want to do that?" > > > > Switching to a faster sort gets us from milliseconds to microseconds > > without adding a lot of hair. > > > > > > > 1) "ld" can sort symbols by name, but encoding the priority into the > > > symbol name is difficult, especially when arithmetic expressions are > > > used. The C-pre-processor can only concat stuff. No, character > > > arithmetic is possible. This solution is not possible. > > > > > > 2) The constructor attribute in "C" can take a priority argument, > > > probably 32-bit and we need 64-bits. Sounds great, but how can you ed= it > > > these lists and merge the constructors? What about destructors and > > > priority. Maybe possible. > > > > > > 3) The compiler can output strings to a magic file during compilation= , > > > like the name of global variables to collect and sort. The C-compiler > > > has #error and #warning, but there is no #write_to_file, simply > > > speaking. Another solution is to store the file output into a separat= e > > > section in the object files and use objcopy to extract the text later > > > on, and process it. > > > > > > > These are all fragile. I don't think the benefit makes the fragility > > worth it. > I agree. Linker tricks are cute but often depend on minor features of > linkers that break often. > > > > > > It may also be another way to fetch PCI/USB device ID information, > > > instead of using linker hints. Currently when probing USB devices, de= vd > > > has a linear list of hints it iterates. That means for every device > > > being plugged, you need to search all products supported for a match. > > > This is slow. Instead a tool could at least sort the entries, so a > > > binary search type function could be used, resulting in O(log2(N)) > > > searching time, instead of O(N). > > > > > > > Except that data is pathologically heterogeneous. There's nothing to so= rt > > in any meaningful way. And it's all about the runtime environment, whic= h > > is impossible to know at build time (today we do it at run time). The > linker > > hints file to know which of many things to load is almost optimal... Ea= ch > > file is different, especially for PCI, which is why we pre-process it > once > > and put it into the linker hints.... So it's pretty good once the syste= m > is > > running, but at startup we parse it multiple times and likely would do > more > > as we move more towards MINIMAL. It's been suggested that we move > BTW, if we are moving to MINIMAL (which I quite like), then with the > proposed approach we need to merge at least N + 1 lists, where N is the > number > of modules. Can the merge be done in place without incurring n**2 > behaviour? > Yes. Today it's a simple linked list. Given each node has different matching, many of the classic methods of having a tree are tricky to apply. And it's a bunch of integer compares... so I'm not sure it is worth doing more than moving the matching into devd... fork/exec is way more expensive as is the I/O to read the klds into memory. Warner > this into devd, so the data persists in its most useful form for a long > > time, > > and that's not a bad suggestion for dealing with the growing number of > > execs to make this all work... It's very Unix Tooly, but also there's > likely > > a good case to be made to optimize here... There's some ordering issues > > that would need to be worked out too... > > Overall, my feel is the same: if better sort can be used to speed this up= , > it is better to not create a monster build system. In kernel, we are not > limited by ABI or POSIX constraints. > --0000000000000a106905fc3e124e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sun, May 21, 2023, 6:18 PM Konstantin Belousov <= kostikbel@gmail.com> wrote:
On Sun, May 21, 2023 at 05:34:26PM -0600, Warner Losh wrote:
> On Sun, May 21, 2023 at 2:13=E2=80=AFPM Hans Petter Selasky <hps@selasky.org> wrote:
>
> > However, if the data in question is sorted at compile time, then = zero
> > time will be spent sorting the data in the kernel. When a kernel = module
> > is loaded and the sysinit/constructors are already sorted by thei= r
> > subsystem/order/priority, all you need to do is to merge two sort= ed
> > lists into another sorted list, and this is pretty much linear. > >
> > The question is, who can sort the sysinits:
> >
>
> The bigger question is "Do we even want to do that?"
>
> Switching to a faster sort gets us from milliseconds to microseconds > without adding a lot of hair.
>
>
> > 1) "ld" can sort symbols by name, but encoding the prio= rity into the
> > symbol name is difficult, especially when arithmetic expressions = are
> > used. The C-pre-processor can only concat stuff. No, character > > arithmetic is possible. This solution is not possible.
> >
> > 2) The constructor attribute in "C" can take a priority= argument,
> > probably 32-bit and we need 64-bits. Sounds great, but how can yo= u edit
> > these lists and merge the constructors? What about destructors an= d
> > priority. Maybe possible.
> >
> > 3) The compiler can output strings to a magic file during compila= tion,
> > like the name of global variables to collect and sort. The C-comp= iler
> > has #error and #warning, but there is no #write_to_file, simply > > speaking. Another solution is to store the file output into a sep= arate
> > section in the object files and use objcopy to extract the text l= ater
> > on, and process it.
> >
>
> These are all fragile. I don't think the benefit makes the fragili= ty
> worth it.
I agree.=C2=A0 Linker tricks are cute but often depend on minor features of=
linkers that break often.

>
> > It may also be another way to fetch PCI/USB device ID information= ,
> > instead of using linker hints. Currently when probing USB devices= , devd
> > has a linear list of hints it iterates. That means for every devi= ce
> > being plugged, you need to search all products supported for a ma= tch.
> > This is slow. Instead a tool could at least sort the entries, so = a
> > binary search type function could be used, resulting in O(log2(N)= )
> > searching time, instead of O(N).
> >
>
> Except that data is pathologically heterogeneous. There's nothing = to sort
> in any meaningful way. And it's all about the runtime environment,= which
> is impossible to know at build time (today we do it at run time). The = linker
> hints file to know which of many things to load is almost optimal... E= ach
> file is different, especially for PCI, which is why we pre-process it = once
> and put it into the linker hints.... So it's pretty good once the = system is
> running, but at startup we parse it multiple times and likely would do= more
> as we move more towards MINIMAL. It's been suggested that we move<= br> BTW, if we are moving to MINIMAL (which I quite like), then with the
proposed approach we need to merge at least N + 1 lists, where N is the num= ber
of modules.=C2=A0 Can the merge be done in place without incurring n**2
behaviour?

Yes. Today it's a simple linked list. Given each node has dif= ferent matching, many of the classic methods of having a tree are tricky to= apply. And it's a bunch of integer compares... so I'm not sure it = is worth doing more than moving the matching into devd... fork/exec is way = more expensive as is the I/O to read the klds into memory.=C2=A0

Warner

=
> this into devd, so the data persists in its most useful form for a lon= g
> time,
> and that's not a bad suggestion for dealing with the growing numbe= r of
> execs to make this all work... It's very Unix Tooly, but also ther= e's likely
> a good case to be made to optimize here... There's some ordering i= ssues
> that would need to be worked out too...

Overall, my feel is the same: if better sort can be used to speed this up,<= br> it is better to not create a monster build system.=C2=A0 In kernel, we are = not
limited by ABI or POSIX constraints.
--0000000000000a106905fc3e124e--