Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Aug 2023 00:10:11 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
Message-ID:  <CAGudoHGMvFvSFwT5jotpiDg2%2B79O2jv1Fx4eAEH0yijs5pAJow@mail.gmail.com>
In-Reply-To: <5D23E6BE-A25C-4190-BB2C-A2D3511ABD90@yahoo.com>
References:  <4FFAE432-21FE-4462-9162-9CC30A5D470A.ref@yahoo.com> <4FFAE432-21FE-4462-9162-9CC30A5D470A@yahoo.com> <b8f819f0-9f6d-af87-64f9-35e37fbf4b2c@FreeBSD.org> <5D23E6BE-A25C-4190-BB2C-A2D3511ABD90@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 8/23/23, Mark Millard <marklmi@yahoo.com> wrote:
> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>
> On Aug 23, 2023, at 11:40, Alexander Motin <mav@FreeBSD.org> wrote:
>
>> On 22.08.2023 14:24, Mark Millard wrote:
>>> Alexander Motin <mav_at_FreeBSD.org> wrote on
>>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>>>> I am waiting for final test results from George Wilson and then will
>>>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>>>> there are still not many reviewers for the PR, since the code is not
>>>> trivial, but at least with the test reports Brian Behlendorf and Mark
>>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>>>> have tested and/or reviewed the PR, you may comment on it.
>>> I had written to the list that when I tried to test the system
>>> doing poudriere builds (initially with your patches) using
>>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>>> instead got only one builder that ended up active, the others
>>> never reaching "Builder started":
>>
>>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>>> . . .
>>>  362     0 root         40    0  27076Ki   13776Ki CPU19   19   4:23
>>> 0.00% cpdup -i0 -o ref 32
>>>  349     0 root         53    0  27076Ki   13776Ki vlruwk  22   4:20
>>> 0.01% cpdup -i0 -o ref 31
>>>  328     0 root         68    0  27076Ki   13804Ki vlruwk   8   4:30
>>> 0.01% cpdup -i0 -o ref 30
>>>  304     0 root         37    0  27076Ki   13792Ki vlruwk   6   4:18
>>> 0.01% cpdup -i0 -o ref 29
>>>  282     0 root         42    0  33220Ki   13956Ki vlruwk   8   4:33
>>> 0.01% cpdup -i0 -o ref 28
>>>  242     0 root         56    0  27076Ki   13796Ki vlruwk   4   4:28
>>> 0.00% cpdup -i0 -o ref 27
>>> . . .
>>> But those processes did show CPU?? on occasion, as well as
>>> *vnode less often. None of the cpdup's was stuck in
>>> Removing your patches did not change the behavior.
>>
>> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
>> deep in that area at least recently, so somebody with more experience
>> there could try to diagnose it.  At very least it does not look related to
>> the ZIL issue discussed in this thread, at least with the information
>> provided, so I am not surprised that the mentioned patches do not affect
>> it.
>
> I did the above intending to test the deadlock in my context but
> ended up not getting that far when I tried to make zfs handle all
> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>
> The zfs context is a simple single partition on the boot media. I
> use ZFS for bectl BE use, not for other typical reasons. The media
> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
> swap, also on that Optane.
>
> # uname -apKU
> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
> amd64 amd64 1400096 1400096
>
> The GENERIC-DBG variant of the kernel did not report any issues in
> earlier testing.
>
> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
> installed from the same build.
>
> # zfs list
> NAME                                          USED  AVAIL  REFER
> MOUNTPOINT
> zoptb                                        79.9G   765G    96K  /zoptb
> zoptb/BUILDs                                 20.5G   765G  8.29M
> /usr/obj/BUILDs
> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt    1.86M   765G  1.86M
> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
> zoptb/BUILDs/main-amd64-dbg-clang            9.96G   765G  9.96G
> /usr/obj/BUILDs/main-amd64-dbg-clang
> zoptb/BUILDs/main-amd64-dbg-gccxtc           38.5M   765G  38.5M
> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
> zoptb/BUILDs/main-amd64-nodbg-clang          10.3G   765G  10.3G
> /usr/obj/BUILDs/main-amd64-nodbg-clang
> zoptb/BUILDs/main-amd64-nodbg-clang-alt      37.2M   765G  37.2M
> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
> zoptb/BUILDs/main-amd64-nodbg-gccxtc         94.6M   765G  94.6M
> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
> zoptb/DESTDIRs                               4.33G   765G   104K
> /usr/obj/DESTDIRs
> zoptb/DESTDIRs/main-amd64-poud               2.16G   765G  2.16G
> /usr/obj/DESTDIRs/main-amd64-poud
> zoptb/DESTDIRs/main-amd64-poud-bulk_a        2.16G   765G  2.16G
> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
> zoptb/ROOT                                   13.1G   765G    96K  none
> zoptb/ROOT/build_area_for-main-amd64         5.03G   765G  3.24G  none
> zoptb/ROOT/main-amd64                        8.04G   765G  3.23G  none
> zoptb/poudriere                              6.58G   765G   112K
> /usr/local/poudriere
> zoptb/poudriere/data                         6.58G   765G   128K
> /usr/local/poudriere/data
> zoptb/poudriere/data/.m                       112K   765G   112K
> /usr/local/poudriere/data/.m
> zoptb/poudriere/data/cache                   17.4M   765G  17.4M
> /usr/local/poudriere/data/cache
> zoptb/poudriere/data/images                    96K   765G    96K
> /usr/local/poudriere/data/images
> zoptb/poudriere/data/logs                    2.72G   765G  2.72G
> /usr/local/poudriere/data/logs
> zoptb/poudriere/data/packages                3.84G   765G  3.84G
> /usr/local/poudriere/data/packages
> zoptb/poudriere/data/wrkdirs                  112K   765G   112K
> /usr/local/poudriere/data/wrkdirs
> zoptb/poudriere/jails                          96K   765G    96K
> /usr/local/poudriere/jails
> zoptb/poudriere/ports                          96K   765G    96K
> /usr/local/poudriere/ports
> zoptb/tmp                                    68.5M   765G  68.5M  /tmp
> zoptb/usr                                    35.1G   765G    96K  /usr
> zoptb/usr/13_0R-src                          2.64G   765G  2.64G
> /usr/13_0R-src
> zoptb/usr/alt-main-src                         96K   765G    96K
> /usr/alt-main-src
> zoptb/usr/home                                181M   765G   181M  /usr/home
> zoptb/usr/local                              5.08G   765G  5.08G
> /usr/local
> zoptb/usr/main-src                            833M   765G   833M
> /usr/main-src
> zoptb/usr/ports                              26.4G   765G  26.4G
> /usr/ports
> zoptb/usr/src                                  96K   765G    96K  /usr/src
> zoptb/var                                    52.6M   765G    96K  /var
> zoptb/var/audit                               356K   765G   356K
> /var/audit
> zoptb/var/crash                               128K   765G   128K
> /var/crash
> zoptb/var/db                                 49.7M   765G    96K  /var/db
> zoptb/var/db/pkg                             49.4M   765G  49.4M
> /var/db/pkg
> zoptb/var/db/ports                            164K   765G   164K
> /var/db/ports
> zoptb/var/log                                1.61M   765G  1.61M  /var/log
> zoptb/var/mail                                632K   765G   632K  /var/mail
> zoptb/var/tmp                                 128K   765G   128K  /var/tmp
>
> # poudriere jail -jmain-amd64-bulk_a -i
> Jail name:         main-amd64-bulk_a
> Jail version:      14.0-ALPHA2
> Jail arch:         amd64
> Jail method:       null
> Jail mount:        /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
> Jail fs:
> Jail updated:      2021-12-04 14:55:22
> Jail pkgbase:      disabled
>
>
>
> So, setting up another test with some related information
> shown before, during, and after. sysctl output is from
> another ssh session than the bulk -a run.
>
> # sysctl -a | grep vnode
> kern.maxvnodes: 2213808
> kern.ipc.umtx_vnode_persistent: 0
> kern.minvnodes: 553452
> vm.vnode_pbufs: 2048
> vm.stats.vm.v_vnodepgsout: 0
> vm.stats.vm.v_vnodepgsin: 272429
> vm.stats.vm.v_vnodeout: 0
> vm.stats.vm.v_vnodein: 12461
> vfs.vnode_alloc_sleeps: 0
> vfs.wantfreevnodes: 553452
> vfs.freevnodes: 962766
> vfs.vnodes_created: 2538980
> vfs.numvnodes: 1056233
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 91878
> debug.vnode_domainset: <NULL>
> debug.sizeof.vnode: 448
> debug.fail_point.status_fill_kinfo_vnode__random_path: off
> debug.fail_point.fill_kinfo_vnode__random_path: off
>
> # poudriere bulk -jmain-amd64-bulk_a -a
> . . .
> [00:01:34] Building 34042 packages using up to 32 builders
> [00:01:34] Hit CTRL+t at any time to see build progress and stats
> [00:01:34] [01] [00:00:00] Builder starting
> [00:01:57] [01] [00:00:23] Builder started
> [00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4
> [00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success
> [00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
> [00:03:22] [02] [00:00:00] Builder starting
> [00:03:22] [03] [00:00:00] Builder starting
> . . .
> [00:03:22] [31] [00:00:00] Builder starting
> [00:03:22] [32] [00:00:00] Builder starting
> [00:03:31] [01] [00:00:09] Finished print/indexinfo | indexinfo-0.3.1:
> Success
> [00:03:31] [01] [00:00:00] Building devel/gettext-runtime |
> gettext-runtime-0.22
> . . .
>
> Note that only [01] makes progress: no new "Builder started"
> notices occur. top shows 31 of the pattern:
> cpdup -i0 -o ref ??
>
> Then during the 31 cpudup's showing vlruwk most of the time:
>
> # sysctl -a | grep vnode
> kern.maxvnodes: 2213808
> kern.ipc.umtx_vnode_persistent: 0
> kern.minvnodes: 553452
> vm.vnode_pbufs: 2048
> vm.stats.vm.v_vnodepgsout: 22844
> vm.stats.vm.v_vnodepgsin: 582398
> vm.stats.vm.v_vnodeout: 890
> vm.stats.vm.v_vnodein: 34296
> vfs.vnode_alloc_sleeps: 2994
> vfs.wantfreevnodes: 553452
> vfs.freevnodes: 2209662
> vfs.vnodes_created: 12206299
> vfs.numvnodes: 2214071
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 459
> debug.vnode_domainset: <NULL>
> debug.sizeof.vnode: 448
> debug.fail_point.status_fill_kinfo_vnode__random_path: off
> debug.fail_point.fill_kinfo_vnode__random_path: off
>
> Wait a while but still the mostly cpdup vlruwk status:
>
> # sysctl -a | grep vnode
> kern.maxvnodes: 2213808
> kern.ipc.umtx_vnode_persistent: 0
> kern.minvnodes: 553452
> vm.vnode_pbufs: 2048
> vm.stats.vm.v_vnodepgsout: 22844
> vm.stats.vm.v_vnodepgsin: 583527
> vm.stats.vm.v_vnodeout: 890
> vm.stats.vm.v_vnodein: 34396
> vfs.vnode_alloc_sleeps: 8053
> vfs.wantfreevnodes: 553452
> vfs.freevnodes: 2210166
> vfs.vnodes_created: 12212061
> vfs.numvnodes: 2215106
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 497
> debug.vnode_domainset: <NULL>
> debug.sizeof.vnode: 448
> debug.fail_point.status_fill_kinfo_vnode__random_path: off
> debug.fail_point.fill_kinfo_vnode__random_path: off
>
> ^C[00:14:55] Error: Signal SIGINT caught, cleaning up and exiting
>
> # sysctl -a | grep vnode
> kern.maxvnodes: 2213808
> kern.ipc.umtx_vnode_persistent: 0
> kern.minvnodes: 553452
> vm.vnode_pbufs: 2048
> vm.stats.vm.v_vnodepgsout: 22844
> vm.stats.vm.v_vnodepgsin: 584474
> vm.stats.vm.v_vnodeout: 890
> vm.stats.vm.v_vnodein: 34591
> vfs.vnode_alloc_sleeps: 17584
> vfs.wantfreevnodes: 553452
> vfs.freevnodes: 2210796
> vfs.vnodes_created: 12222343
> vfs.numvnodes: 2216564
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 539
> debug.vnode_domainset: <NULL>
> debug.sizeof.vnode: 448
> debug.fail_point.status_fill_kinfo_vnode__random_path: off
> debug.fail_point.fill_kinfo_vnode__random_path: off
>
> [main-amd64-bulk_a-default] [2023-08-23_13h58m08s] [sigint:] Queued: 34435
> Built: 2     Failed: 0     Skipped: 35    Ignored: 358   Fetched: 0
> Tobuild: 34040  Time: 00:14:36
> [00:16:13] Logs:
> /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-08-23_13h58m08s
> [00:16:49] Cleaning up
> load: 5.28  cmd: sh 77057 [vlruwk] 141.63r 0.00u 30.98s 28% 6932k
> #0 0xffffffff80b76ebb at mi_switch+0xbb
> #1 0xffffffff80bc960f at sleepq_timedwait+0x2f
> #2 0xffffffff80b76610 at _sleep+0x1d0
> #3 0xffffffff80c5b2dc at vn_alloc_hard+0x2ac
> #4 0xffffffff80c50a12 at getnewvnode_reserve+0x92
> #5 0xffffffff829afb12 at zfs_zget+0x22
> #6 0xffffffff8299ca8d at zfs_dirent_lookup+0x16d
> #7 0xffffffff8299cb5f at zfs_dirlook+0x7f
> #8 0xffffffff829ac410 at zfs_lookup+0x350
> #9 0xffffffff829a782a at zfs_freebsd_cachedlookup+0x6a
> #10 0xffffffff80c368ad at vfs_cache_lookup+0xad
> #11 0xffffffff80c3b6d8 at cache_fplookup_final_modifying+0x188
> #12 0xffffffff80c38766 at cache_fplookup+0x356
> #13 0xffffffff80c43fb2 at namei+0x112
> #14 0xffffffff80c62e5b at kern_funlinkat+0x13b
> #15 0xffffffff80c62d18 at sys_unlink+0x28
> #16 0xffffffff83b8e583 at filemon_wrapper_unlink+0x13
> #17 0xffffffff81049a79 at amd64_syscall+0x109
>
> [00:26:28] Unmounting file systems
> Exiting with status 1
>
> # sysctl -a | grep vnode
> kern.maxvnodes: 2213808
> kern.ipc.umtx_vnode_persistent: 0
> kern.minvnodes: 553452
> vm.vnode_pbufs: 2048
> vm.stats.vm.v_vnodepgsout: 22844
> vm.stats.vm.v_vnodepgsin: 585384
> vm.stats.vm.v_vnodeout: 890
> vm.stats.vm.v_vnodein: 34798
> vfs.vnode_alloc_sleeps: 27578
> vfs.wantfreevnodes: 553452
> vfs.freevnodes: 61362
> vfs.vnodes_created: 20135479
> vfs.numvnodes: 59860
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 208
> debug.vnode_domainset: <NULL>
> debug.sizeof.vnode: 448
> debug.fail_point.status_fill_kinfo_vnode__random_path: off
> debug.fail_point.fill_kinfo_vnode__random_path: off
>
>
> For reference (from after):
>
> # kldstat
> Id Refs Address                Size Name
>  1   95 0xffffffff80200000  274b308 kernel
>  2    1 0xffffffff8294c000   5d5238 zfs.ko
>  3    1 0xffffffff82f22000     7718 cryptodev.ko
>  4    1 0xffffffff83b10000     3390 acpi_wmi.ko
>  5    1 0xffffffff83b14000     3220 intpm.ko
>  6    1 0xffffffff83b18000     2178 smbus.ko
>  7    1 0xffffffff83b1b000     2240 cpuctl.ko
>  8    1 0xffffffff83b1e000     3360 uhid.ko
>  9    1 0xffffffff83b22000     4364 ums.ko
> 10    1 0xffffffff83b27000     33c0 usbhid.ko
> 11    1 0xffffffff83b2b000     3380 hidbus.ko
> 12    1 0xffffffff83b2f000     4d20 ng_ubt.ko
> 13    6 0xffffffff83b34000     abb8 netgraph.ko
> 14    2 0xffffffff83b3f000     a250 ng_hci.ko
> 15    4 0xffffffff83b4a000     2670 ng_bluetooth.ko
> 16    1 0xffffffff83b4d000     83a0 uftdi.ko
> 17    1 0xffffffff83b56000     4e58 ucom.ko
> 18    1 0xffffffff83b5b000     3360 wmt.ko
> 19    1 0xffffffff83b5f000     e268 ng_l2cap.ko
> 20    1 0xffffffff83b6e000    1bf68 ng_btsocket.ko
> 21    1 0xffffffff83b8a000     38f8 ng_socket.ko
> 22    1 0xffffffff83b8e000     3250 filemon.ko
> 23    1 0xffffffff83b92000     4758 nullfs.ko
> 24    1 0xffffffff83b97000     73c0 linprocfs.ko
> 25    3 0xffffffff83b9f000     be70 linux_common.ko
> 26    1 0xffffffff83bab000     3558 fdescfs.ko
> 27    1 0xffffffff83baf000    31b20 linux.ko
> 28    1 0xffffffff83be1000    2ed40 linux64.ko
>
>
> Note that before the "Cleaning up" notice, the vfs.freevnodes
> shows as being around (for example) 2210796. But after
> "Exiting with status": 61362. vfs.vnodes_created has a
> similar staging of in the ball park of up to 12222343 but
> then the change to: 20135479. Similarly, vfs.numvnodes
> 2216564 -> 59860.
>
>
>
> Anything else I should gather and report as basic information?
>

This is a known problem, but it is unclear if you should be running
into it in this setup.

Can you try again but this time *revert*
138a5dafba312ff39ce0eefdbe34de95519e600d, like so:
git revert 138a5dafba312ff39ce0eefdbe34de95519e600d

may want to switch to a different branch first, for example: git
checkout -b vfstesting

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHGMvFvSFwT5jotpiDg2%2B79O2jv1Fx4eAEH0yijs5pAJow>