Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Oct 2023 16:42:10 +0100
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        John F Carr <jfc@mit.edu>
Cc:        Freebsd fs <freebsd-fs@freebsd.org>
Subject:   Re: vnode_init took over 6 minutes
Message-ID:  <CAGudoHG301oJy76s_ONwxv9oAo1Y_vNi0BALPh03Kpw=JaByfA@mail.gmail.com>
In-Reply-To: <8F17FEC3-C3CB-410D-ACEC-ADD12B3A747A@mit.edu>
References:  <67FBC292-E83D-4171-BE79-D28D5578F031@mit.edu> <CAGudoHFv0y9-4NMYCnhQh%2BvbQj0dReofGpCx_zejqz1RKDGNcA@mail.gmail.com> <8F17FEC3-C3CB-410D-ACEC-ADD12B3A747A@mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/31/23, John F Carr <jfc@mit.edu> wrote:
>
>
>> On Oct 31, 2023, at 10:16, Mateusz Guzik <mjguzik@gmail.com> wrote:
>>
>> On 10/31/23, John F Carr <jfc@mit.edu> wrote:
>>> During a poudriere build control-T showed an awk process was stuck
>>> for over 6 minutes trying to acquire a lock in vnode_init.
>>>
>>> load: 8.58  cmd: awk 51238 [running] 392.88r 0.00u 392.88s 100% 2604k
>>> __mtx_lock_sleep+0xf8 __mtx_lock_flags+0xa4 vnode_init+0xc3
>>> keg_alloc_slab+0x277 zone_import+0x143 cache_alloc+0x3ed
>>> cache_alloc_retry+0x23 getnewvnode_reserve+0x20 zfs_zget+0x1f
>>> zfs_dirent_lookup+0x16d zfs_dirlook+0x7f zfs_lookup+0x3e0
>>> zfs_freebsd_cachedlookup+0x74 vfs_cache_lookup+0xa7
>>> cache_fplookup_noentry+0x241 cache_fplookup+0x575 namei+0x1ea
>>> vn_open_cred+0x48d
>>>
>>> The stack trace was the same for several minutes.  System CPU time was
>>> increasing.
>>>
>>> Address vnode_init+0xc3 corresponds to the mtx_lock call here.
>>>
>>> vp->v_holdcnt = VHOLD_NO_SMR;
>>> vp->v_type = VNON;
>>> mtx_lock(&vnode_list_mtx);
>>> TAILQ_INSERT_BEFORE(vnode_list_free_marker, vp, v_vnodelist);
>>> mtx_unlock(&vnode_list_mtx);
>>> return (0);
>>>
>>> Address __mtx_lock_sleep+0xf8 is the instruction after a call to
>>> lock_delay.
>>>
>>> ps says the command line was
>>>
>>> /usr/bin/awk -f /usr/bin/awk
>>> old-default/2023-10-31_08h21m03s/.poudriere.builders
>>> old-default/2023-10-31_08h21m03s/.poudriere.buildname ...
>>>
>>> with the full list of input files exceeding the ~2KB command line length
>>> limit of ps.
>>> "/usr/bin/awk" is probably not the real second argument.  It would cause
>>> an
>>> immediate syntax error.
>>>
>>> The hang resolved within a few more minutes and poudriere is continuing
>>> happily.
>>>
>>> I have never seen such behavior before.  Code in vfs_subr.c tries not to
>>> hold the
>>> vnode_list_mtx lock for too long.
>>>
>>> Any thoughts?
>>>
>>>
>>> FreeBSD 13.2-STABLE up through commit b180f0040f95, 24 core 48 thread
>>> Zen
>>> 2,
>>> zfs pool on SSD connected via 12 Gbps SAS with cache on NVME, 160 GB
>>> RAM.
>>>
>>
>> what does "sysctl vfs.vnode" say
>>
>>
>> --
>> Mateusz Guzik <mjguzik gmail.com>
>
> With the builds now complete,
>
> vfs.vnode.vnlru.uma_reclaim_calls: 0
> vfs.vnode.vnlru.kicks: 0
> vfs.vnode.vnlru.max_free_per_call: 10000
> vfs.vnode.vnlru.failed_runs: 0
> vfs.vnode.vnlru.direct_recycles_free: 152871194
> vfs.vnode.vnlru.recycles_free: 77190881
> vfs.vnode.vnlru.recycles: 0
> vfs.vnode.stats.alloc_sleeps: 0
> vfs.vnode.stats.free: 1269702
> vfs.vnode.stats.skipped_requeues: 9375158
> vfs.vnode.stats.created: 235594976
> vfs.vnode.stats.count: 1407640
> vfs.vnode.param.wantfree: 681059
> vfs.vnode.param.limit: 2724237
>
> Uptime is 11 days, mostly with low I/O rates.
>
>

please add the uma stats: "sysctl vm.uma.VNODE"

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHG301oJy76s_ONwxv9oAo1Y_vNi0BALPh03Kpw=JaByfA>