Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 09 Nov 2020 22:12:54 +0000
From:      Alexander V. Chernikov <melifaro@ipfw.ru>
To:        John Baldwin <jhb@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-arch <freebsd-arch@freebsd.org>
Subject:   Re: Versioning support for kernel<>userland sysctl interface
Message-ID:  <835171604959876@mail.yandex.ru>
In-Reply-To: <efddf8f3-ae00-72f3-a66d-ed603c19a4e1@FreeBSD.org>
References:  <356181604233241@mail.yandex.ru> <20201101183919.GK2654@kib.kiev.ua> <efddf8f3-ae00-72f3-a66d-ed603c19a4e1@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
02.11.2020, 18:30, "John Baldwin" <jhb@freebsd.org>:
> On 11/1/20 10:39 AM, Konstantin Belousov wrote:
>>  On Sun, Nov 01, 2020 at 12:47:17PM +0000, Alexander V. Chernikov wrote:
>>>  Hey folks,
>>>
>>>  I would like to propose a change [1] that introduces versioning support for the data structures exposed to userland by sysctl interface.
>>>
>>>  We have dozens of interfaces exposing various statistics and control data by filling in and exporting structures.
>>>  net.inet6.icmp6.stats or net.inet6.icmp6.nd6_prlist can be a good examples of such interaction.
>>>
>>>  Most of these structure do not have version information embedded, which requires us to break compatibility when changing them.
>>>
>>>  The idea behind the change is really simple: append current structure version number to the sysctl OID to get the desired version of the structure.
>>>
>>>  For example, fetching "net.inet6.icmp6.stats" becomes "net.inet6.icmp6.stats.1" (or, code-wise, something like "net.inet6.icmp6.stats." __XSTRING(ICMP6STAT_VER)).
>>>
>>>  The interface satistifes the following properties:
>>>  1) preserving backward compatibility
>>>  2) allowing for low-cost kernel ABI maintenance
>>>  2) allowing for forward compatibility - application can fetch list of all supported versions of a structure.
>>>
>>>  Example:
>>>  11:25 [1] m@devel0 sysctl -o net.inet6.icmp6.stats
>>>  net.inet6.icmp6.stats.0: Format:S Length:4328 Dump:0x00000000000000000000000000000000...
>>>  net.inet6.icmp6.stats.1: Format:S Length:4624 Dump:0x00000000000000000000000000000000...
>>>
>>>  12:42 [1] m@devel0 ~/test net.inet6.icmp6.stats
>>>  sysctlnametomib("net.inet6.icmp6.stats")=0 -> 4.28.58.1.
>>>  sysctl("net.inet6.icmp6.stats")=-1 sz=512
>>>
>>>  12:43 [1] m@devel0 ~/test net.inet6.icmp6.stats.1
>>>  sysctlnametomib("net.inet6.icmp6.stats.1")=0 -> 4.28.58.1.1.
>>>  sysctl("net.inet6.icmp6.stats.1")=-1 sz=512
>>>
>>>  Some downside of this change would be the potential need to duplicate structures definitions to be 100% sure we don't break API. For example, rebuilding & running 3rd-party software may result in error fetching the necessary structure. Unmodified application build with the latest structure version will request an oldest version of a structure.
>>>
>>>  I see multiple approaches to address it:
>>>  1) duplicate structure with a new name (appending postfix like _v) - works the best for small structure
>>>  2) do nothing specific - will mostly work for append-only statistics structures
>>>  3) rely on kernel warning on calling unversioned sysctls to identify & fix the problematic customers
>>>
>>>  Please take a look at [1] for a more detailed technical description of a change.
>>>
>>>  Any feedback is highly appreciated.
>>>
>>>  [1] https://reviews.freebsd.org/D27035
>>
>>  There was some desire to provide backward ABI-compat shims for sysctls during
>>  ino64 work, https://reviews.freebsd.org/D10439.
>>
>>  Most prominent idea from that time, AFAIR, was to have another MIB tree,
>>  that would be have all the same MIBs but rooted with osrel. In other words,
>>  if you accessed e.g. MIB 1.2.3.4, libc internally translates that to MIB
>>  1024.<osrel>.1.2.3.4, and kernel applies whatever shims it knows about that
>>  osrel version. If there is no compat, call goes directly to 1.2.3.4 handler.
>>  The osrel value can be taken from the binary ABI note, as an example.
>>
>>  There was some discussion, but after more work done on this, it appeared
>>  that not much sysctls need ABI shims at all, and interesting cases could
>>  be adequately handled simply by checking passed buffer length.
>>
>>  The osrel approach has a drawback that it ignores possibly different ABI
>>  of the loaded shared library which might make the call. On the other hand,
>>  it avoids introducing additional burden of requiring consumers to learn
>>  new MIBs and manually handle versions.
>
> Some other thoughts were along the lines of having a kind of "sysctl tree"
> version that would get bumped when there was an ABI breakage of a node
> and then have associated versioned symbols of sysctl() and related symbols
> in libc to handle the shared library problem. However, you'd want to avoid
> an explosion of symbol versions.
>
> One of the goals was to keep API compat as much as possible. I think we
> might have also considered having 'sysctl_ver()' that takes an explicit
> __FreeBSD_version value and having 'sysctl()' become a macro that passes
> in '_FreeBSD_version' to sysctl_ver() so that you encode the desired ABI
> in each invocation. You would have to keep existing symbols in libc that
> would pass in a version of 0. One of the concerns with this approach is
> it removes the public 'sysctl' symbol which might break configure, etc.
> scripts.
Hi Konstantin, John,

Sorry for the belated reply.

Thank you for sharing the outcome of previous discussions on the topic.
It’s extremely useful and generally - re-using osrel is an interesting approach I haven’t considered.

Let me try to take a look at it a bit further and summarise my understanding:

So, generally there are 2 different parts: choosing structure versioning scheme and passing this information to the kernel.

For the former, osrel has an awesome property of increasing counter already attached to an application. It potentially allows to avoid any API/ABI changes in userland programs.

However, if we choose to go that way, we will be coupling every structure change to an osrel change. Will it scale when we have more structures moved to this mechanism? Do we want 200 bumps per release?
How do we bump it for 3rd-party modules?

The latter, version passing mechanism, is comprised of sysctlbyname() and sysctl() calls.
Prepending string / OIDs will add performance implications for each and every sysctl call, despite the fact that multiple versions are needed only for the tiny amount of sysctls in kernel.

I’m wondering if there are any alternative way to natively pass osrel to the kernel. For example, is p_osrel something that can potentially be made reliable (e.g. return non-zero value) for all “new” binaries?


I thought about it a bit more and I still prefer the approach describe in the initial messages. It is

* ABI compatible
* requires minimal API changes (sysctlbyname("net.inet6.icmp6.stats") becomes sysctlbyname(SYSCTL_VERSIONED("net.inet6.icmp6.stats", ICMP6STAT_VER))
* does not have any perfomance impact / symbols magic
* solves any issues with linking - multiple libraries within a binary can request/use different versions of a struct
* no problem with 3rd party kernel modules

I’d love to get a bit more feedback/critique on that approach as well :-)
> --
> John Baldwin
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?835171604959876>