Date: Fri, 9 Dec 2022 22:13:11 -0700 From: Warner Losh <imp@bsdimp.com> To: "Alexander V. Chernikov" <melifaro@ipfw.ru> Cc: Baptiste Daroussin <bapt@freebsd.org>, hackers@freebsd.org Subject: Re: devctl_notify system is inconsistent Message-ID: <CANCZdfq_0r6yOzrHbZv9PBt3b2yESK_i2y5ma7Bd7m7Uny3Gpg@mail.gmail.com> In-Reply-To: <F853E72D-61DD-4098-B3EC-2F54A322E8CE@ipfw.ru> References: <20221201083559.xx3v5jn7sf44rfmv@aniel.nours.eu> <CANCZdfpdkxGaan6fvTYxm4YO17-t=V4yuzPWYJnGd7Ao73NQgw@mail.gmail.com> <20221201145905.ua2jlh25usqrqjy4@aniel.nours.eu> <CANCZdfoxKs9FJMyRh=gVSo9JnX-Lb7-ybXmq==NvzjdKVpckBw@mail.gmail.com> <20221201174043.5yqtfbs6p63bdgqp@aniel.nours.eu> <F853E72D-61DD-4098-B3EC-2F54A322E8CE@ipfw.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000419eab05ef7254b6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Dec 9, 2022 at 8:26 AM Alexander V. Chernikov <melifaro@ipfw.ru> wrote: > > > > On 1 Dec 2022, at 17:40, Baptiste Daroussin <bapt@FreeBSD.org> wrote: > > > > On Thu, Dec 01, 2022 at 08:38:37AM -0700, Warner Losh wrote: > >> On Thu, Dec 1, 2022 at 7:59 AM Baptiste Daroussin <bapt@freebsd.org> > wrote: > >> > >>> On Thu, Dec 01, 2022 at 07:45:32AM -0700, Warner Losh wrote: > >>>> On Thu, Dec 1, 2022 at 1:36 AM Baptiste Daroussin <bapt@freebsd.org> > >>> wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> After the addition of netlink(4) by melifaro@, I started working on > a > >>> new > >>>>> genetlink(4) module, to send kernel notification to the userland vi= a > >>>>> netlink. > >>>>> > >>>>> The goal is to be able to have multiple consumers without the need = of > >>> devd > >>>>> to be > >>>>> running. > >>>>> > >>>>> The goal is also to be able subscribe to the events the consumer is > >>>>> willing to > >>>>> receive. > >>>>> > >>>>> https://reviews.freebsd.org/D37574 > >>>>> > >>>>> I also added a hook to devctl_notify to make sure all its event got > >>> sent > >>>>> via > >>>>> nlsysevent. (https://reviews.freebsd.org/D37573) > >>>>> > >>>> > >>>> You're connecting it at the wrong level: it will miss all device > events. > >>>> devctl_notify > >>>> is used by everything except newbus's device attach, detach and > nomatch > >>>> events, so none of those events will make it through. > >>> > >>> Where should I hook to get the device events? > >>> > >> > >> Either you need to drop down a level to where the formated events are > >> queued, > >> or you'll need to add something in devaddq() to deal with the events. > These > >> are > >> a slightly different format than the devctl_notify() events because th= e > >> latter was > >> added later and I didn't want to cope with transitioning the old > formatted > >> messages > >> to the new at that time (silly me). > >> > >> > >>>> > >>>> > >>>>> It works great and so far I am happy with it. on thing I figured ou= t > >>> it is: > >>>>> the "system" argment of devctl_notify is inconsistent: > >>>>> Upper case vs lower case > >>>>> "kern" vs "kernel" > >>>>> > >>>> > >>>> They are consistent. With one exception that I deprecated in 13.x to > be > >>>> removed in 14.x. I documented all of them at the time in devd.conf. = I > >>> think > >>>> I'll go ahead and complete the deprecation. > >>>> > >>>> > >>>>> I intent to fix the following way: > >>>>> Create a new function similar to devctl_notify but with the first > >>> argument > >>>>> being > >>>>> an enum. > >>>>> > >>>> > >>>> I'm a pretty hard no on the enum. I rejected doing it that way when = I > >>> wrote > >>>> devd > >>>> years ago. These have always been strings, and need to continue to > >>> always be > >>>> strings, or we're forever modifying devd(8) when we add a new one an= d > >>>> introducing > >>>> yet another opportunity for mismatch. I don't think this is a good > idea > >>> at > >>>> all. > >>>> > >>>> There's been users of devd in the past that are loadable modules tha= t > >>> pass > >>>> their > >>>> own system name in that devd.conf files would then process. Having a= n > >>> enum > >>>> with > >>>> enforcement would just get in the way of this. > >>>> > >>>> I have toyed with the idea of having a centralized list in bus.h > that's a > >>>> bunch of > >>>> #defines, ala old-school X11 resources (which this is very very > loosely > >>>> based on), > >>>> but it didn't seem worth the effort. > >>> > >>> That is fine with me and actually I do need the string name to > register as > >>> group > >>> name, that I didn't want to trash the strings away. > >>> > >>> But I need a way to list them all. > >>> > >> > >> We have no current mechanism to do that. We could add something that > would > >> register / deregister them with a sysinit + call to something in > >> kern_devctl.c which > >> could do the trick (and also deal with the ordering issues), though > having > >> netlink(4) > >> as a loadable modules might be an interesting case to get right. > >> > >> If we did that, we could return a 'token' that you'd use to call a new > >> version of > >> devctl_notify(), perhaps with some glue for legacy users (or perhaps > not: > >> we change > >> kernel interfaces all the time, and could just rename it and convert > >> everything in the > >> tree). > >> > >>> > >>>> > >>>>> Make the current devctl_notify convert its first argument into that > >>> enum > >>>>> and > >>>>> yell if an unkwown "system" is passed. (and probably declare > >>> devctl_notify > >>>>> deprecated) > >>>>> > >>>> > >>>> I don't think this buys us anything. devctl_notify cannot possibly > know > >>>> about all > >>>> the possible subsystems, so this is likely doomed to high maintenanc= e > >>> that > >>>> would > >>>> be largely ineffective. > >>>> > >>>> Then fix the inconsistencies: all upper case as it seems the most > wildly > >>> use > >>>>> case > >>>>> s/kern/kernel/g > >>>>> > >>>>> WDYT? > >>>>> > >>>> > >>>> I wouldn't worry about the upper vs lower case. All the documented > ones > >>> are > >>>> upper case, except kernel. There's no harm it being one exception > since > >>>> changing > >>>> it will break user's devd.conf setups. I got enough feedback when I > >>> changed > >>>> "kern" to "kernel" last year for power events. And as far as I know, > I've > >>>> documented > >>>> all the events that are generated today. > >>>> > >>>> Warner > >>> > >>> > >>> Note that if you think nlsysevent is a bad idea, I will just forget > about > >>> it. > >>> > >> > >> I love the idea. I got over any resistance I had when melifaro@ moved > >> things into kern_devctl.c and we talked through the issues at that tim= e. > >> I've been hoping that someone would replace the hacky thing I did with > >> a 'routing socket'-like interface (I never could figure out hose to do > it so > >> many years ago w/o gross hacks). netlink(4) has finally provided a way > >> to do it that doesn't get the routing code all jumbled up for this. > >> > >> I just have some specific issues with your proposal. In hindsight, I > should > >> have led with that in my first message :). > >> > >> Warner > > > > Here is my new proposal: > > > > What about: > > > > 1. I add a static list of systems in sys/devctl.h something like > > > > enum { > > DEVCTL_SYSTEM_ACPI =3D 0, > > DEVCTL_SYSTEM_AEON =3D 1, > > ... > > DEVCTL_SYSTEM_ZFS =3D 19 > > }; > > > > static const char *devctl_systems[] =3D { > > "ACPI", > > "AEON", > > ... > > "ZFS", > > }; > > > > this way we have a list of official freebsd's systems. > > We don't change the devctl_notify interface > > > > and in the kernel we change the devctl_notify("ZFS" into > > devctl_notify(devctl_systems[DEVCTL_SYSTEM_ZFS],... > > > > This is not too intrusive, and breaks none of the existing code > > > > 2. I also hook devadq using the same interface as I already have done, > but > > all the attach,detach,nomatch become an event (only in nlsysevent) in t= he > > "DEVICE" system, > > the "SUBSYSTEM" is the current what of devaddq > > > > The type is changed into: > > + -> ATTACH > > - -> DETACH > > anythingelse -> NOMATCH > > and the reste of the current line becomes the data > > > > so from nlsysvent point of view this is exactly the same kind of events > as the > > one hooked in devctl_notify. > > > > 3. in nlsysevent we have one category one can subscribe per known > systems. > > All other "systems" falls into a new category named "extra" "vendor" or > "other" > > > > from the consumer point of view the name of the system is anyway > contained in > > the message itself, so the category it is subscribed to can differ. > > > > This way, I should be able to get ALL the events in a consistent way. > > I should be compatible with people who uses devctl_notify in their > custom kernel > > modules. > > > > Sounds easy enough without the requirement of complexifying > kern_devctl.c with a > > registration of extra systems. > > > > What do you think? > My 2 cents: > > I=E2=80=99d look at the groups from the customer (e.g. userland API POV).= IMHO, > fine-grained groups serves 2 purposes. > The first is limiting the amount of notifications the app has to deal wit= h. Yea, that breaks devd. It assumes it can 'catch up' on events that happened before it got around to running and registering. > The second is moving filtering from the > application to the kernel, so the app doesn=E2=80=99t need to check the e= vent=E2=80=99s > subsystem. Note that the second part loses its value if the app subscribe= s > for more than one group. > It is also worth noting that _some_ notifications, like ifnet event, may > be high-volume, so it can be desirable to have a way to filter them out. > For example, it can be implemented as a fixed number of broad group > categories (=E2=80=9Cnet=E2=80=9D, =E2=80=9Cdevice=E2=80=9D, =E2=80=9Cfs= =E2=80=9D, =E2=80=9Cpower=E2=80=9D, =E2=80=9Cvendor=E2=80=9D) and each subs= ystem > picks one it belongs to. > It can be potentially extended with an ability to register custom groups > if so desired. > I wouldn=E2=80=99t say the existing KPI consisting of a single devctl_not= ify() is > set in stone. I=E2=80=99d actually vote to have a new function/set of fun= ctions, as > the notifications are no longer limited to the devices, as devctl suggest= s. > I'm not sure I see the value in kernel filtering. The data rates are low. The number of apps is also low, so the savings is tiny today. > For example (just to illustrate the point): > group_id =3D unotify_get_group(=E2=80=9Cpower=E2=80=9D) # gets existing o= r registers new > group in the netlink subsystem > unotify_broadcast(group_id, =E2=80=9CACPI=E2=80=9D, =E2=80=A6) > unotify_free_group(group_id) > I see no value in this with the current base of consumers / producers. It sounds cool, but at the present time the data rates are so low that the extra complexity of filtering in the kernel likely would dwarf doing it in userland. Warner > > > > Best regards, > > Bapt > > > > --000000000000419eab05ef7254b6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Fri, Dec 9, 2022 at 8:26 AM Alexan= der V. Chernikov <<a href=3D"mailto:melifaro@ipfw.ru">melifaro@ipfw.ru</= a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0p= x 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><b= r> <br> > On 1 Dec 2022, at 17:40, Baptiste Daroussin <bapt@FreeBSD.org> w= rote:<br> > <br> > On Thu, Dec 01, 2022 at 08:38:37AM -0700, Warner Losh wrote:<br> >> On Thu, Dec 1, 2022 at 7:59 AM Baptiste Daroussin <<a href=3D"m= ailto:bapt@freebsd.org" target=3D"_blank">bapt@freebsd.org</a>> wrote:<b= r> >> <br> >>> On Thu, Dec 01, 2022 at 07:45:32AM -0700, Warner Losh wrote:<b= r> >>>> On Thu, Dec 1, 2022 at 1:36 AM Baptiste Daroussin <<a h= ref=3D"mailto:bapt@freebsd.org" target=3D"_blank">bapt@freebsd.org</a>><= br> >>> wrote:<br> >>>> <br> >>>>> Hello,<br> >>>>> <br> >>>>> After the addition of netlink(4) by melifaro@, I start= ed working on a<br> >>> new<br> >>>>> genetlink(4) module, to send kernel notification to th= e userland via<br> >>>>> netlink.<br> >>>>> <br> >>>>> The goal is to be able to have multiple consumers with= out the need of<br> >>> devd<br> >>>>> to be<br> >>>>> running.<br> >>>>> <br> >>>>> The goal is also to be able subscribe to the events th= e consumer is<br> >>>>> willing to<br> >>>>> receive.<br> >>>>> <br> >>>>> <a href=3D"https://reviews.freebsd.org/D37574" rel=3D"= noreferrer" target=3D"_blank">https://reviews.freebsd.org/D37574</a><br> >>>>> <br> >>>>> I also added a hook to devctl_notify to make sure all = its event got<br> >>> sent<br> >>>>> via<br> >>>>> nlsysevent. (<a href=3D"https://reviews.freebsd.org/D3= 7573" rel=3D"noreferrer" target=3D"_blank">https://reviews.freebsd.org/D375= 73</a>)<br> >>>>> <br> >>>> <br> >>>> You're connecting it at the wrong level: it will miss = all device events.<br> >>>> devctl_notify<br> >>>> is used by everything except newbus's device attach, d= etach and nomatch<br> >>>> events, so none of those events will make it through.<br> >>> <br> >>> Where should I hook to get the device events?<br> >>> <br> >> <br> >> Either you need to drop down a level to where the formated events = are<br> >> queued,<br> >> or you'll need to add something in devaddq() to deal with the = events. These<br> >> are<br> >> a slightly different format than the devctl_notify() events becaus= e the<br> >> latter was<br> >> added later and I didn't want to cope with transitioning the o= ld formatted<br> >> messages<br> >> to the new at that time (silly me).<br> >> <br> >> <br> >>>> <br> >>>> <br> >>>>> It works great and so far I am happy with it. on thing= I figured out<br> >>> it is:<br> >>>>> the "system" argment of devctl_notify is inc= onsistent:<br> >>>>> Upper case vs lower case<br> >>>>> "kern" vs "kernel"<br> >>>>> <br> >>>> <br> >>>> They are consistent. With one exception that I deprecated = in 13.x to be<br> >>>> removed in 14.x. I documented all of them at the time in d= evd.conf. I<br> >>> think<br> >>>> I'll go ahead and complete the deprecation.<br> >>>> <br> >>>> <br> >>>>> I intent to fix the following way:<br> >>>>> Create a new function similar to devctl_notify but wit= h the first<br> >>> argument<br> >>>>> being<br> >>>>> an enum.<br> >>>>> <br> >>>> <br> >>>> I'm a pretty hard no on the enum. I rejected doing it = that way when I<br> >>> wrote<br> >>>> devd<br> >>>> years ago. These have always been strings, and need to con= tinue to<br> >>> always be<br> >>>> strings, or we're forever modifying devd(8) when we ad= d a new one and<br> >>>> introducing<br> >>>> yet another opportunity for mismatch. I don't think th= is is a good idea<br> >>> at<br> >>>> all.<br> >>>> <br> >>>> There's been users of devd in the past that are loadab= le modules that<br> >>> pass<br> >>>> their<br> >>>> own system name in that devd.conf files would then process= . Having an<br> >>> enum<br> >>>> with<br> >>>> enforcement would just get in the way of this.<br> >>>> <br> >>>> I have toyed with the idea of having a centralized list in= bus.h that's a<br> >>>> bunch of<br> >>>> #defines, ala old-school X11 resources (which this is very= very loosely<br> >>>> based on),<br> >>>> but it didn't seem worth the effort.<br> >>> <br> >>> That is fine with me and actually I do need the string name to= register as<br> >>> group<br> >>> name, that I didn't want to trash the strings away.<br> >>> <br> >>> But I need a way to list them all.<br> >>> <br> >> <br> >> We have no current mechanism to do that. We could add something th= at would<br> >> register / deregister them with a sysinit + call to something in<b= r> >> kern_devctl.c which<br> >> could do the trick (and also deal with the ordering issues), thoug= h having<br> >> netlink(4)<br> >> as a loadable modules might be an interesting case to get right.<b= r> >> <br> >> If we did that, we could return a 'token' that you'd u= se to call a new<br> >> version of<br> >> devctl_notify(), perhaps with some glue for legacy users (or perha= ps not:<br> >> we change<br> >> kernel interfaces all the time, and could just rename it and conve= rt<br> >> everything in the<br> >> tree).<br> >> <br> >>> <br> >>>> <br> >>>>> Make the current devctl_notify convert its first argum= ent into that<br> >>> enum<br> >>>>> and<br> >>>>> yell if an unkwown "system" is passed. (and = probably declare<br> >>> devctl_notify<br> >>>>> deprecated)<br> >>>>> <br> >>>> <br> >>>> I don't think this buys us anything. devctl_notify can= not possibly know<br> >>>> about all<br> >>>> the possible subsystems, so this is likely doomed to high = maintenance<br> >>> that<br> >>>> would<br> >>>> be largely ineffective.<br> >>>> <br> >>>> Then fix the inconsistencies: all upper case as it seems t= he most wildly<br> >>> use<br> >>>>> case<br> >>>>> s/kern/kernel/g<br> >>>>> <br> >>>>> WDYT?<br> >>>>> <br> >>>> <br> >>>> I wouldn't worry about the upper vs lower case. All th= e documented ones<br> >>> are<br> >>>> upper case, except kernel. There's no harm it being on= e exception since<br> >>>> changing<br> >>>> it will break user's devd.conf setups. I got enough fe= edback when I<br> >>> changed<br> >>>> "kern" to "kernel" last year for power= events. And as far as I know, I've<br> >>>> documented<br> >>>> all the events that are generated today.<br> >>>> <br> >>>> Warner<br> >>> <br> >>> <br> >>> Note that if you think nlsysevent is a bad idea, I will just f= orget about<br> >>> it.<br> >>> <br> >> <br> >> I love the idea. I got over any resistance I had when melifaro@ mo= ved<br> >> things into kern_devctl.c and we talked through the issues at that= time.<br> >> I've been hoping that someone would replace the hacky thing I = did with<br> >> a 'routing socket'-like interface (I never could figure ou= t hose to do it so<br> >> many years ago w/o gross hacks). netlink(4) has finally provided a= way<br> >> to do it that doesn't get the routing code all jumbled up for = this.<br> >> <br> >> I just have some specific issues with your proposal. In hindsight,= I should<br> >> have led with that in my first message :).<br> >> <br> >> Warner<br> > <br> > Here is my new proposal:<br> > <br> > What about:<br> > <br> > 1. I add a static list of systems in sys/devctl.h something like<br> > <br> > enum {<br> > DEVCTL_SYSTEM_ACPI =3D 0,<br> > DEVCTL_SYSTEM_AEON =3D 1,<br> > ...<br> > DEVCTL_SYSTEM_ZFS =3D 19<br> > };<br> > <br> > static const char *devctl_systems[] =3D {<br> >=C2=A0 "ACPI",<br> >=C2=A0 "AEON",<br> >=C2=A0 ...<br> >=C2=A0 "ZFS",<br> > };<br> > <br> > this way we have a list of official freebsd's systems.<br> > We don't change the devctl_notify interface<br> > <br> > and in the kernel we change the devctl_notify("ZFS" into<br> > devctl_notify(devctl_systems[DEVCTL_SYSTEM_ZFS],...<br> > <br> > This is not too intrusive, and breaks none of the existing code<br> > <br> > 2. I also hook devadq using the same interface as I already have done,= but<br> > all the attach,detach,nomatch become an event (only in nlsysevent) in = the<br> > "DEVICE" system,<br> > the "SUBSYSTEM" is the current what of devaddq<br> > <br> > The type is changed into:<br> > + -> ATTACH<br> > - -> DETACH<br> > anythingelse -> NOMATCH<br> > and the reste of the current line becomes the data<br> > <br> > so from nlsysvent point of view this is exactly the same kind of event= s as the<br> > one hooked in devctl_notify.<br> > <br> > 3. in nlsysevent we have one category one can subscribe per known syst= ems.<br> > All other "systems" falls into a new category named "ex= tra" "vendor" or "other"<br> > <br> > from the consumer point of view the name of the system is anyway conta= ined in<br> > the message itself, so the category it is subscribed to can differ.<br= > > <br> > This way, I should be able to get ALL the events in a consistent way.<= br> > I should be compatible with people who uses devctl_notify in their cus= tom kernel<br> > modules.<br> > <br> > Sounds easy enough without the requirement of complexifying kern_devct= l.c with a<br> > registration of extra systems.<br> > <br> > What do you think?<br> My 2 cents:<br> <br> I=E2=80=99d look at the groups from the customer (e.g. userland API POV). I= MHO, fine-grained groups serves 2 purposes.<br> The first is limiting the amount of notifications the app has to deal with.= </blockquote><div><br></div><div>Yea, that breaks devd. It assumes it can &= #39;catch up' on events that happened before it got around to running a= nd registering.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" sty= le=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddi= ng-left:1ex">The second is moving filtering from the<br> application to the kernel, so the app doesn=E2=80=99t need to check the eve= nt=E2=80=99s subsystem. Note that the second part loses its value if the ap= p subscribes for more than one group.<br> It is also worth noting that _some_ notifications, like ifnet event, may be= high-volume, so it can be desirable to have a way to filter them out.<br> For example, it can be implemented as a fixed number of broad group categor= ies (=E2=80=9Cnet=E2=80=9D, =E2=80=9Cdevice=E2=80=9D, =E2=80=9Cfs=E2=80=9D,= =E2=80=9Cpower=E2=80=9D, =E2=80=9Cvendor=E2=80=9D) and each subsystem pick= s one it belongs to.<br> It can be potentially extended with an ability to register custom groups if= so desired.<br> I wouldn=E2=80=99t say the existing KPI consisting of a single devctl_notif= y() is set in stone. I=E2=80=99d actually vote to have a new function/set o= f functions, as the notifications are no longer limited to the devices, as = devctl suggests.<br></blockquote><div><br></div><div>I'm not sure I see= the value in kernel filtering. The data rates are low. The number of apps = is also low, so the savings is tiny today.</div><div>=C2=A0</div><blockquot= e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s= olid rgb(204,204,204);padding-left:1ex">For example (just to illustrate the= point):<br> group_id =3D unotify_get_group(=E2=80=9Cpower=E2=80=9D) # gets existing or = registers new group in the netlink subsystem<br> unotify_broadcast(group_id, =E2=80=9CACPI=E2=80=9D, =E2=80=A6)<br> unotify_free_group(group_id)<br></blockquote><div><br></div><div>I see no v= alue in this with the current base of consumers / producers. It sounds cool= , but at the present time the data rates are so low that the extra complexi= ty of filtering in the kernel likely would dwarf doing it in userland.</div= ><div><br></div><div>Warner</div><div>=C2=A0</div><blockquote class=3D"gmai= l_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,20= 4,204);padding-left:1ex"> > <br> > Best regards,<br> > Bapt<br> > <br> <br> </blockquote></div></div> --000000000000419eab05ef7254b6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfq_0r6yOzrHbZv9PBt3b2yESK_i2y5ma7Bd7m7Uny3Gpg>