From nobody Tue Sep 6 15:07:21 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MMTJl0YGbz4c16L for ; Tue, 6 Sep 2022 15:07:35 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x936.google.com (mail-ua1-x936.google.com [IPv6:2607:f8b0:4864:20::936]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MMTJk2p3Qz4623 for ; Tue, 6 Sep 2022 15:07:34 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x936.google.com with SMTP id s5so4449410uar.1 for ; Tue, 06 Sep 2022 08:07:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=kIk9hr+28Jf1EdYY5P+lc4Th0woLqhAnSmDF3E+MVps=; b=Dp4bCbVoLJJrT36wQxNcAuuIpthcnr0hIxUZ2HRJ1A8WanqfJdM91cYy5/Drj/sqkd u5MnQkCVsVpRCOmcT3CcSLwcaSZiEpBP1AiLxWkkvnXJ4+8oxHChDkRwED3q/kEyyyuT JOzc7wt5G1kq5OmvVNPCANOEn53oiikiON2odG4qAKMryNK+6BWLHnnVKLpT0DxYCqag E0Dc8ykMqGr6ni+Bg4mSc0kJZnoA6FHcRSpt4GUmAb4V7lPFywc3sKIg/Lo2OC4JYIdJ Q1OHx+CUr68bmwNSuwiEipl7ydtRDHxEya68K0XFdv5P45nDk9iqd36xoUcXrECD/RZy uORQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=kIk9hr+28Jf1EdYY5P+lc4Th0woLqhAnSmDF3E+MVps=; b=Bmj8i0s7ITPpwkEy7fImzX0rp7qmj+moTLEUBRQOQjGUC1xi4EpDJQ95nkuUbfqf8i F551xZae2kCBYwjr7Ubm3eZ2sfQkeGQme9b6WVKLGebhHgMrFEmUHwdfJnW6XrvtL/sy Pa/li9jH9EiXLgUn+3wrf3kDc1lVi3MT+osLkYfzgoMZWhKorrZjHJi9Mcc5sqHaL1C3 qjvA+iZkGINqygzO/g1zGS4POUeetrMfCCA3IoQxf5PBuY5+DODEh2KMHWObgn1sisQz +GfUuXSvZLDsHPmM7GM/HHvEffMcPqVlObB0FrBt+ilH6aV6EtUF5dJWLI1X+tEcuMLq cUZA== X-Gm-Message-State: ACgBeo3IMMHx8eyua3z2Z6zCoejqLXO9HcVXAYNRPf8Q1307pqnxMWNd cguj5QcKv01s4JlQjfVIxcNks1+oI66W0tWR8xPEC1/n8N7uvg== X-Google-Smtp-Source: AA6agR6iYEgMJtSmzOERdviSifQX/7aIOPtmHyDm0a3ojC/4On/EqAB9nbCOCZCVN1VV73q1CASoFRct3vXm96FmbbI= X-Received: by 2002:a9f:2067:0:b0:387:984d:4a8e with SMTP id 94-20020a9f2067000000b00387984d4a8emr16293970uam.60.1662476852440; Tue, 06 Sep 2022 08:07:32 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Tue, 6 Sep 2022 09:07:21 -0600 Message-ID: Subject: Re: Header symbols that shouldn't be visible to ports? To: Konstantin Belousov Cc: Alan Somers , FreeBSD CURRENT Content-Type: multipart/alternative; boundary="0000000000003cc41d05e8038e3f" X-Rspamd-Queue-Id: 4MMTJk2p3Qz4623 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=Dp4bCbVo; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::936) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-2.00 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FREEMAIL_TO(0.00)[gmail.com]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::936:from]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; RCPT_COUNT_THREE(0.00)[3]; DMARC_NA(0.00)[bsdimp.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N --0000000000003cc41d05e8038e3f Content-Type: text/plain; charset="UTF-8" On Tue, Sep 6, 2022 at 7:34 AM Konstantin Belousov wrote: > On Mon, Sep 05, 2022 at 08:41:58AM -0600, Alan Somers wrote: > > On Sat, Sep 3, 2022 at 11:10 PM Konstantin Belousov > wrote: > > > > > > On Sat, Sep 03, 2022 at 10:19:12AM -0600, Alan Somers wrote: > > > > Our /usr/include headers define a lot of symbols that are used by > > > > critical utilities in the base system like ps and ifconfig, but > aren't > > > > stable across major releases. Since they aren't stable, utilities > > > > built for older releases won't run correctly on newer ones. Would it > > > > make sense to guard these symbols so they can't be used by programs > in > > > > the ports tree? There is some precedent for that, for example > > > > _WANT_SOCKET and _WANT_MNTOPTNAMES. > > > _WANT_SOCKET is clearly about exposing parts of the kernel definitions > > > for userspace code that wants to dig into kernel structures. Similarly > > > for _WANT_MNTOPTNAMES, but in fact this thing is quite stable. The > > > definitions are guarded by additional defines not due to their > instability, > > > but because using them in userspace requires (much) more preparation > from > > > userspace environment, which is either not trivial (_WANT_SOCKET) or > > > contradicts to standartized use of the header (_WANT_MNTOPTNAMES + > > > sys/mount.h). > > > > > > > > > > > I'm particular, I'm thinking about symbols like the following: > > > > MINCORE_SUPER > > > Why this symbol should be hidden? It is implementation-defined and > > > intended to be exposed to userspace. All MINCORE_* not only > MINCORE_SUPER > > > are under BSD_VISIBLE braces, because POSIX does not define the > symbols. > > > > Because it isn't stable. It changed for example in rev 847ab36bf22 > > for 13.0. Programs using the older value (including virtually every > > Rust program) won't work on 13.0 and later. > As Mark replied, older values still mostly work. It was considered to > not make unacceptable ABI change. > > > > > > > > > > TDF_* > > > These symbols coming from non-standard header sys/proc.h. If userspace > > > includes the header, it is already outside any formal standard, and I > > > do not see a reason to make the implementation more convoluted there. > > > > > > > PRI_MAX* > > > > PRI_MIN* > > > > PI_*, PRIBIO, PVFS, etc > > > > IFCAP_* > > > These are all implementation-specific and come from non-standard > headers, > > > unless I am mistaken, then please correct me. > > > > > > > RLIM_NLIMITS > > > > IFF_* > > > Same. > > > > > > > *_MAXID > > > This is too broad. > > > > I'm talking about symbols like IPV6CTL_MAXID, which record the size of > > sysctl lists. Obviously, these symbols can't be stable, and probably > > aren't useful outside of the base system. > The programs are not forced to use the symbols. FFI bindings should not > provide them, why do we need to specifically hide such defines? > > > > > > > > > > > > > > Clearly delineating private symbols like this would ease the > > > > maintenance burden on languages that rely on FFI, like Ruby and Rust. > > > > FFI basically assumes that symbols once defined will never change. > > > > > > Why e.g. sys/proc.h is ever consumed by FFI wrappers? > > > > I should add a little detail. Rust uses FFI to access C functions, > > and #define'd constants are redefined in the Rust bindings. For most > > Rust programs, the build process doesn't check the contents of > > /usr/include in any way. Instead, all of that stuff is hard-coded in > > the Rust bindings. That makes cross-compiling a breeze! > Well, at the cost of the maintaining Rust libc crate. > [Sorry, cannot refrain https://kib.kiev.ua/kib/rust_c_ffi.png ] > > > But it does > > cause problems when the C library changes. Adding a new symbol, like > > copy_file_range, isn't so bad. If your Rust program doesn't use it, > > then the Rust binding will become an unused symbol and get eliminated > > by the linker. If your Rust program does use it OTOH, then it will be > > resolved by the dynamic linker at runtime - if you're running on > > FreeBSD 13 or newer. Otherwise, your program will fail to run. > The program would either fail at start if it does not reference the > symbol version in some other way (due to other symbol), or at runtime > when trying to do dynamic binding to that symbol otherwise. > > > A > > bigger problem is with symbols that change. For example, the 64-bit > > inode stuff. Rust programs still use a FreeBSD 11 ABI (we're working > > on that). > We did not changed symbols for ino64. Old symbols were retained, the new > symbols were added under the new version. > > > But other symbols change more frequently. Things like > > PRI_MAX_REALTIME can change between any two releases. That creates a > > big maintenance burden to keep track of them in the FFI bindings. And > > they also aren't very useful in cross-compiled programs targeting a > > FreeBSD 11 ABI. Instead, they really need to have bindings > > automatically generated at build time. That's possible, but it's not > > the default. > > > > So what the Rust community really needs is a way to know which symbols > > will be stable across releases, and which might vary. > Symbols, as something exported from libc/libthr/libm, are stable. > We promise this and follow this promise strictly from FreeBSD 6.x. > > Some defines from headers are not stable, but they do not form the exported > system ABI anyway. You need to know what you are doing when changing libc. > Similarly, when you update Rust libc crate, you have to know what you are > providing, it cannot be done automatically. > FreeBSD developers get this wrong from time to time. We have to carefully curate new symbols to the libraires, and deal with #defines that are part of the core ABI that we've kept stable. We don't have any formal tests here to ensure things work, apart from people trying things and having them break due to some oversight. For example, recently, the CAM ioctls changed so that old passthrough CCB ioctls stopped working. It took months for people to notice. It was broken in a release. The fix was a trivial one-liner once someone noticed and made the effort. Since there was no automation to test it, it went unnoticed. > Expecting that we (FreeBSD developers) would mark up each definition in > the headers files is unreasonable. Even if this enormous work would be > done once, it rot immediately. The outcome of the work is not used by > anything in either the base system, or in 99.999% of the ports. As result, > anybody doing any work on the base libraries, make mistakes. > > > Are you > > suggesting that anything from a non-POSIX header file should be > > considered variable? > No, I suggest that anything not in POSIX namespace should be scrutinized > for ABI stability, instead of stating that 'it is available, so lets make > bindings for it'. > > I have the sympathy for Rust decision to provide isolated libc crate. > It certainly makes sense for the Rust ecosystem. > > But then having this crate to depend on autogeneration from /usr/include > negates the intent of isolation. I think, if you want to have > automatic binding generation used, you must provide the white list of > symbols and definitions that go into the crate. > I agree with kib: I don't see how asking developers to know, a priori, which of our unstable interfaces will change and which won't. If you import things that aren't in the POSIX namespace, you need to work with those namespace providers to import them properly. Whitelisting is the only way that I see it working. At most, we have resources to allow you to maintain this list, and allow you to make changes on a best effort basis that history has suggested will be about 90% successful. Absent some automation that enforces it, it will break. And even with automation, you'll need someone to review breakage and/or additions to ensure the right things get updated. Developers might be able to lend a hand, but that can't be relied upon without oversight to work. Warner --0000000000003cc41d05e8038e3f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Sep 6, 2022 at 7:34 AM Konsta= ntin Belousov <kostikbel@gmail.co= m> wrote:
On Mon, Sep 05, 2022 at 08:41:58AM -0600, Alan Somers wrote:
> On Sat, Sep 3, 2022 at 11:10 PM Konstantin Belousov <kostikbel@gmail.com> wrot= e:
> >
> > On Sat, Sep 03, 2022 at 10:19:12AM -0600, Alan Somers wrote:
> > > Our /usr/include headers define a lot of symbols that are us= ed by
> > > critical utilities in the base system like ps and ifconfig, = but aren't
> > > stable across major releases.=C2=A0 Since they aren't st= able, utilities
> > > built for older releases won't run correctly on newer on= es.=C2=A0 Would it
> > > make sense to guard these symbols so they can't be used = by programs in
> > > the ports tree?=C2=A0 There is some precedent for that, for = example
> > > _WANT_SOCKET and _WANT_MNTOPTNAMES.
> > _WANT_SOCKET is clearly about exposing parts of the kernel defini= tions
> > for userspace code that wants to dig into kernel structures.=C2= =A0 Similarly
> > for _WANT_MNTOPTNAMES, but in fact this thing is quite stable.=C2= =A0 The
> > definitions are guarded by additional defines not due to their in= stability,
> > but because using them in userspace requires (much) more preparat= ion from
> > userspace environment, which is either not trivial (_WANT_SOCKET)= or
> > contradicts to standartized use of the header (_WANT_MNTOPTNAMES = +
> > sys/mount.h).
> >
> > >
> > > I'm particular, I'm thinking about symbols like the = following:
> > > MINCORE_SUPER
> > Why this symbol should be hidden?=C2=A0 It is implementation-defi= ned and
> > intended to be exposed to userspace.=C2=A0 All MINCORE_* not only= MINCORE_SUPER
> > are under BSD_VISIBLE braces, because POSIX does not define the s= ymbols.
>
> Because it isn't stable.=C2=A0 It changed for example in rev 847ab= 36bf22
> for 13.0.=C2=A0 Programs using the older value (including virtually ev= ery
> Rust program) won't work on 13.0 and later.
As Mark replied, older values still mostly work.=C2=A0 It was considered to=
not make unacceptable ABI change.

>
> >
> > > TDF_*
> > These symbols coming from non-standard header sys/proc.h.=C2=A0 I= f userspace
> > includes the header, it is already outside any formal standard, a= nd I
> > do not see a reason to make the implementation more convoluted th= ere.
> >
> > > PRI_MAX*
> > > PRI_MIN*
> > > PI_*, PRIBIO, PVFS, etc
> > > IFCAP_*
> > These are all implementation-specific and come from non-standard = headers,
> > unless I am mistaken, then please correct me.
> >
> > > RLIM_NLIMITS
> > > IFF_*
> > Same.
> >
> > > *_MAXID
> > This is too broad.
>
> I'm talking about symbols like IPV6CTL_MAXID, which record the siz= e of
> sysctl lists.=C2=A0 Obviously, these symbols can't be stable, and = probably
> aren't useful outside of the base system.
The programs are not forced to use the symbols.=C2=A0 FFI bindings should n= ot
provide them, why do we need to specifically hide such defines?

>
> >
> > >
> > > Clearly delineating private symbols like this would ease the=
> > > maintenance burden on languages that rely on FFI, like Ruby = and Rust.
> > > FFI basically assumes that symbols once defined will never c= hange.
> >
> > Why e.g. sys/proc.h is ever consumed by FFI wrappers?
>
> I should add a little detail.=C2=A0 Rust uses FFI to access C function= s,
> and #define'd constants are redefined in the Rust bindings.=C2=A0 = For most
> Rust programs, the build process doesn't check the contents of
> /usr/include in any way.=C2=A0 Instead, all of that stuff is hard-code= d in
> the Rust bindings.=C2=A0 That makes cross-compiling a breeze!
Well, at the cost of the maintaining Rust libc crate.
[Sorry, cannot refrain https://kib.kiev.ua/kib/rust_c_ffi.png<= /a> ]

> But it does
> cause problems when the C library changes.=C2=A0 Adding a new symbol, = like
> copy_file_range, isn't so bad.=C2=A0 If your Rust program doesn= 9;t use it,
> then the Rust binding will become an unused symbol and get eliminated<= br> > by the linker.=C2=A0 If your Rust program does use it OTOH, then it wi= ll be
> resolved by the dynamic linker at runtime - if you're running on > FreeBSD 13 or newer.=C2=A0 Otherwise, your program will fail to run. The program would either fail at start if it does not reference the
symbol version in some other way (due to other symbol), or at runtime
when trying to do dynamic binding to that symbol otherwise.

> A
> bigger problem is with symbols that change.=C2=A0 For example, the 64-= bit
> inode stuff.=C2=A0 Rust programs still use a FreeBSD 11 ABI (we're= working
> on that).
We did not changed symbols for ino64.=C2=A0 Old symbols were retained, the = new
symbols were added under the new version.

> But other symbols change more frequently.=C2=A0 Things like
> PRI_MAX_REALTIME can change between any two releases.=C2=A0 That creat= es a
> big maintenance burden to keep track of them in the FFI bindings.=C2= =A0 And
> they also aren't very useful in cross-compiled programs targeting = a
> FreeBSD 11 ABI.=C2=A0 Instead, they really need to have bindings
> automatically generated at build time.=C2=A0 That's possible, but = it's not
> the default.
>
> So what the Rust community really needs is a way to know which symbols=
> will be stable across releases, and which might vary.
Symbols, as something exported from libc/libthr/libm, are stable.
We promise this and follow this promise strictly from FreeBSD 6.x.

Some defines from headers are not stable, but they do not form the exported=
system ABI anyway.=C2=A0 You need to know what you are doing when changing = libc.
Similarly, when you update Rust libc crate, you have to know what you are providing, it cannot be done automatically.

=
FreeBSD developers get this wrong from time to time. We have to carefu= lly curate
new symbols to the libraires, and deal with #defines t= hat are part of the core ABI that
we've kept stable. We don&#= 39;t have any formal tests here to ensure things work, apart
from= people trying things and having them break due to some oversight.

Expecting that we (FreeBSD developers) would mark up each definition in
the headers files is unreasonable.=C2=A0 Even if this enormous work would b= e
done once, it rot immediately.=C2=A0 The outcome of the work is not used by=
anything in either the base system, or in 99.999% of the ports.=C2=A0 As re= sult,
anybody doing any work on the base libraries, make mistakes.

> Are you
> suggesting that anything from a non-POSIX header file should be
> considered variable?
No, I suggest that anything not in POSIX namespace should be scrutinized for ABI stability, instead of stating that 'it is available, so lets ma= ke
bindings for it'.

I have the sympathy for Rust decision to provide isolated libc crate.
It certainly makes sense for the Rust ecosystem.

But then having this crate to depend on autogeneration from /usr/include negates the intent of isolation.=C2=A0 I think, if you want to have
automatic binding generation used, you must provide the white list of
symbols and definitions that go into the crate.

I agree with kib: I don't see how asking developers to know, a= priori,
which of our unstable interfaces will change and which w= on't.=C2=A0 If you import
things that aren't in the POSIX= namespace, you need to work with those
namespace providers to im= port them properly. Whitelisting is the only way
that I see it wo= rking.

At most, we have resources to allow you to = maintain this list, and allow you
to make changes on a best effor= t basis that history has suggested will be
about 90% successful. = Absent some automation that enforces it, it will break.
And even = with automation, you'll need someone to review breakage and/or
additions to ensure the right things get updated. Developers might be abl= e
to lend a hand, but that can't be relied upon without overs= ight to work.

Warner

--0000000000003cc41d05e8038e3f--