Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 04 Sep 2023 08:19:47 +0200
From:      Alexander Leidinger <Alexander@Leidinger.net>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, current@freebsd.org
Subject:   Re: Speed improvements in ZFS
Message-ID:  <1d0d37f27e4898f1604c6ddc6ad3e831@Leidinger.net>
In-Reply-To: <076f09cc0b99643072d8b80a6ec5b03b@Leidinger.net>
References:  <CAGudoHEP8TrSzz0TL-PsOx0WNc7z3042wJk-jhhVwhTyJ0VEQQ@mail.gmail.com> <88e837aeb5a65c1f001de2077fb7bcbd@Leidinger.net> <4d60bd12b482e020fd4b186a9ec1a250@Leidinger.net> <CAGudoHE7RPcHpQEqKbzRM8cJcYKue17=iPVv8iOfZq03h22tTA@mail.gmail.com> <73f7c9d3db8f117deb077fb17b1e352a@Leidinger.net> <CAGudoHGPw0Dmnv6ont8JGyLsT7qv%2BQqAFZO3tKOpNo3eN%2BJgLQ@mail.gmail.com> <58493b568dbe9fb52cc55de86e01f5e2@Leidinger.net> <CAGudoHEyZh1DU=j_6mOfB3tSKhC-pNokPgONDbf4oF3D3A5=jg@mail.gmail.com> <ZOKC3-6uyPUO8qNY@kib.kiev.ua> <58ac6211235c52d744666e8ae2ec7568@Leidinger.net> <ZOMmHF0RiVyroUk8@kib.kiev.ua> <444770b977b02b98985928bea450e4ce@Leidinger.net> <CAGudoHF20EVPcrdRixfhktp-==8=CuYLY6wpPkXLRRizQLCsKA@mail.gmail.com> <076f09cc0b99643072d8b80a6ec5b03b@Leidinger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 2023-08-28 22:33, schrieb Alexander Leidinger:
> Am 2023-08-22 18:59, schrieb Mateusz Guzik:
>> On 8/22/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
>>> Am 2023-08-21 10:53, schrieb Konstantin Belousov:
>>>> On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:
>>>>> Am 2023-08-20 23:17, schrieb Konstantin Belousov:
>>>>> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
>>>>> > > On 8/20/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
>>>>> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>>>>> > > >> On 8/20/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
>>>>> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>>>>> > > >>>> On 8/18/23, Alexander Leidinger <Alexander@leidinger.net>
>>>>> > > >>>> wrote:
>>>>> > > >>>
>>>>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
>>>>> > > >>>>> interested
>>>>> > > >>>>> to
>>>>> > > >>>>> get it?
>>>>> > > >>>>>
>>>>> > > >>>>
>>>>> > > >>>> Your problem is not the vnode limit, but nullfs.
>>>>> > > >>>>
>>>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>>>>> > > >>>
>>>>> > > >>> 122 nullfs mounts on this system. And every jail I setup has
>>>>> > > >>> several
>>>>> > > >>> null mounts. One basesystem mounted into every jail, and then
>>>>> > > >>> shared
>>>>> > > >>> ports (packages/distfiles/ccache) across all of them.
>>>>> > > >>>
>>>>> > > >>>> First, some of the contention is notorious VI_LOCK in order to
>>>>> > > >>>> do
>>>>> > > >>>> anything.
>>>>> > > >>>>
>>>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from
>>>>> > > >>>> exclusive locking which should not be there to begin with -- as
>>>>> > > >>>> in
>>>>> > > >>>> that xlock in stat should be a slock.
>>>>> > > >>>>
>>>>> > > >>>> Maybe I'm going to look into it later.
>>>>> > > >>>
>>>>> > > >>> That would be fantastic.
>>>>> > > >>>
>>>>> > > >>
>>>>> > > >> I did a quick test, things are shared locked as expected.
>>>>> > > >>
>>>>> > > >> However, I found the following:
>>>>> > > >>         if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>>>>> > > >>                 mp->mnt_kern_flag |=
>>>>> > > >> lowerrootvp->v_mount->mnt_kern_flag &
>>>>> > > >>                     (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>>>>> > > >>                     MNTK_EXTENDED_SHARED);
>>>>> > > >>         }
>>>>> > > >>
>>>>> > > >> are you using the "nocache" option? it has a side effect of
>>>>> > > >> xlocking
>>>>> > > >
>>>>> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>>>>> > > >
>>>>> > >
>>>>> > > If you don't have "nocache" on null mounts, then I don't see how
>>>>> > > this
>>>>> > > could happen.
>>>>> >
>>>>> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
>>>>> > for
>>>>> > fuse and nfs at least.
>>>>> 
>>>>> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS
>>>>> exported.
>>>>> 6 of those nullfs mounts are also exported via Samba. The NFS 
>>>>> exports
>>>>> shouldn't be needed anymore, I will remove them.
>>>> By nfs I meant nfs client, not nfs exports.
>>> 
>>> No NFS client mounts anywhere on this system. So where is this 
>>> exclusive
>>> lock coming from then...
>>> This is a ZFS system. 2 pools: one for the root, one for anything I 
>>> need
>>> space for. Both pools reside on the same disks. The root pool is a 
>>> 3-way
>>> mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
>>> space-pool. The jails are all basejail-style jails.
>>> 
>> 
>> While I don't see why xlocking happens, you should be able to dtrace
>> or printf your way into finding out.
> 
> dtrace looks to me like a faster approach to get to the root than 
> printf... my first naive try is to detect exclusive locks. I'm not 100% 
> sure I got it right, but at least dtrace doesn't complain about it:
> ---snip---
> #pragma D option dynvarsize=32m
> 
> fbt:nullfs:null_lock:entry
> /args[0]->a_flags & 0x080000 != 0/
> {
>         stack();
> }
> ---snip---
> 
> In which direction should I look with dtrace if this works in tonights 
> run of periodic? I don't have enough knowledge about VFS to come up 
> with some immediate ideas.

After your sysctl fix for maxvnodes I increased the amount of vnodes 10 
times compared to the initial report. This has increased the speed of 
the operation, the find runs in all those jails finished today after ~5h 
(@~8am) instead of in the afternoon as before. Could this suggest that 
in parallel some null_reclaim() is running which does the exclusive 
locks and slows down the entire operation?

Bye,
Alexander.

-- 
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1d0d37f27e4898f1604c6ddc6ad3e831>