Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Sep 2023 15:46:13 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        David Chisnall <theraven@freebsd.org>
Cc:        Bakul Shah <bakul@iitbombay.org>, Graham Perrin <grahamperrin@gmail.com>,  FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: Continually count the number of open files
Message-ID:  <CAGudoHHGJjJY9YqHn2z__tLaPhPLQJu=t5oEdP50EcHZc7yKAQ@mail.gmail.com>
In-Reply-To: <1D86A8FB-ACC6-427E-ABB0-2E1A5989170E@FreeBSD.org>
References:  <291ad2de-ba0e-4bdf-786a-19614eacec49@gmail.com> <592123F4-E610-446E-82B4-ACC519C0BA3E@iitbombay.org> <1D86A8FB-ACC6-427E-ABB0-2E1A5989170E@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 9/13/23, David Chisnall <theraven@freebsd.org> wrote:
> On 12 Sep 2023, at 17:19, Bakul Shah <bakul@iitbombay.org> wrote:
>>
>> FreeBSD
>> should add inotify.
>
> inotify is also probably not the right thing.  If someone is interested i=
n
> adding this, Apple=E2=80=99s fsevents API is a better inspiration.  It is=
 carefully
> designed to ensure that the things monitoring for events can=E2=80=99t ev=
er block
> filesystem operations from making progress.

I'm not sure what you mean here specifically and I don't see anything
careful about what they did.

>From userspace POV the API is allowed to drop events, which makes life
easy on this front and is probably the right call.

The implementation is utterly horrid. For one, the non-blocking aspect
starts with the obvious equivalent of uma_zalloc(..., M_NOWAIT) and
bailing if it fails, except if you read past that to actual
registration it can perform an alloc which can block indefinitely
while holding on to some vnodes:
                        // if we haven't gotten the path yet, get it.
                        if (pathbuff =3D=3D NULL) {
                                pathbuff =3D get_pathbuff();
                                pathbuff_len =3D MAXPATHLEN;

where get_pathbuf is:
        return zalloc(ZV_NAMEI);

So the notification routine can block indefinitely in a low-memory
condition. I tried to figure out if this is ever called without other
vnodes write-locked (as opposed to "just" refed), but their code is
such a mess that my level of curiosity was dwarfed by difficulty of
getting a definitive answer.

Other than that it is terribly inefficient and artificially limited to
8 processes which can do anything.

That is to say it is unfit for anything but laptop-scale usage.

Perhaps you meant it does not block if the watchers decide to not
process any events, but that's almost inherently true if one allows
for lossy notifications.

> I think there=E2=80=99s a nice design
> possible with a bloom filter in the kernel of events that ensures that
> monitors may get spurious events but don=E2=80=99t miss out on anything.
>
[snip]
>  I think the right kernel API would walk the directory and add the vnodes=
 to
> a bloom filter and trigger a notification on a match in the filter.  You=
=E2=80=99d
> then have occasional spurious notifications but you=E2=80=99d have someth=
ing that
> could be monitored via kqueue and could be made to not block anything els=
e
> in the kernel.
>

I don't see how this can work.

A directory can have more inodes than you can have vnodes at any
point. So if you add vnodes to a list as you go, they may fall off of
so that you can continue adding other entries.

But perhaps you mean you could store the inode number as opposed to
holding to the vnode? Even then, the number of entries to scan to make
it happen is so big that it is going to be impractical on anything but
laptop-scale.

What can be fast is checking if the parent dir wants notifications,
but this ignores changes to hardlinks. Except *currently* the VFS
layer does not reliably track who the parent is (and in fact it can
fail to spot one).

The VFS layer contains a lot of cruft and design decisions which at
least today are questionable at best, but fixable. A big chunk of this
concerns name caching, which currently is entirely optional. Should
someone want to propose an API for file notification changes, they
need to state something which if implemented does not result in
unfixable drag on the layer, even if initial implementation would be
suboptimal. Handling arbitrary hardlinks looks like a drag to me, but
I'm happy to review an implementation which avoids being a problem.

That is to say, a laptop-scale API can probably be implemented as is,
but solution which can provide reliable events (not to be confused
with reliably notifying about all events) would require numerous
changes.

--=20
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHHGJjJY9YqHn2z__tLaPhPLQJu=t5oEdP50EcHZc7yKAQ>