From nobody Thu Aug 24 01:14:00 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RWQ8n5ssjz4rbt0 for ; Thu, 24 Aug 2023 01:14:17 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic304-25.consmr.mail.gq1.yahoo.com (sonic304-25.consmr.mail.gq1.yahoo.com [98.137.68.206]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4RWQ8n2dQWz3bTx for ; Thu, 24 Aug 2023 01:14:17 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1692839654; bh=SgIxz7Vx8Q8osFVPJw6GkGlcqIOtsi9U3uUfh03HE04=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=RPWZqOZeH8dR4uKE8Wz/IYSxjeoFHBstGLzcbFYQqDb2tKb9tJ01q+PGc588Gp2ZPANp/SKq3NEKw4Rz5gz2VWSpKBSzLNjHGTIaDKqvZrE4qmxF9OQRI6SZqNWmgPTYphj2KPRk0axHWaKqH75b6gPiqxQ27UuLWGzvfqfRRsX0qVifhlPqPpU0agSZpHVVngaL/ltXQuNM0lhEVU8s1ePmH0c58Y/7U2KsS244VyFvwmD9Knuun6YD0+2/w0nY0VtzoJLbBV8Zswvqqv18iN3nxaHgd5AEsZrwoxLVjBoDZHYWCocs2CD3N8VrvC+Tvz4dzSPAqOESoBihaEcwFA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1692839654; bh=yRVfc/lZ9hWPvG/5NXePySt8FZwbB/HG9QhKw5Sbh+7=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=qdYLss6+++6V6neqRdDkKm9aLE0N/DyYcP5v9aH/OB3h8SxGEu54j1KYx1y8x9UzcugaSTs4NaLY+TGFmL5WNokdqP92qvPQviLjdjJCkPcVuBWv6B5yYn/XYF6FX9GedLXInFu1l7vQvMyUxfhaZz2ndPXnVw6QVcNa1QmC1SXAvIvaxFBWRppERwzuBS/pD0+Kv7taxet2Qn3ItyYKdGCkZzfcLfm4UTCJkgYCSEA22TqJrR6kDJwE3auY4gjCb0BpVK3EVQaPvh88X2J0/UpDEIBPQouFrRkCGwTJCaIUJcta9QC2Uk1R+LWFTgA0s1Rd2Jnf4901333AqP/drg== X-YMail-OSG: xBA.jcwVM1nxXThuw5BmGS5PB5qPEeRNoMnHAwTXsFkcm19C9bpBjxJRt_CqWkt iQKRnAwQZJje7vAyQb3jN.AGGMWU0u.RbgPg7IB6XahQdNV50IZTGBYofjhHw3vfBUj8x6QMjsRi EvaT9WVw3IajNWO5EkmM8t76Ex_0IWcLHSHu_lyDgei1lESQxQ7ZCwgTLhzVkWEQN6IUsoSa9PZ0 jRzJgaMJv9c6LlPogPu2jfZa_aIoRS9C6Ro_urVZ4BTt7fkhapImxqOFucFpuQr92wmbJb7ldr1H xaCN0.vUlRprpYo7O_bq2I2jf7H2OATh8yF7YEVcVilfroS4YLasIQDtmHasip0yhqhPeaJcJd9C 7FfoBt_TNeOR9X4RPs5c8TglSQAP_d2Qi08PaeiM21GmnlpNbWhMAH7.m2mmSVJ.pnO6r8DpPv2S jJk4QtBNGr_uygPHXoK_B5PbpvGC74AK3Es0usFwR2NhMinFarbK4lbpoJrsZAxBMuj7uA9iz3Fj 6EkDR.NH37YLSfQOBfO1gGeJQE.8yIQOUZDH.groc8plp8jJOXldvcqgEtPFgd7c0PsziCsU37lN oYqdYf1doAKLKoRS8ncTE760YlKVjNzk85Nro808YnCNRAzxuLNw1cHfMea7_erWTllQRlJqcgDs wjT9ut8bIyq7a7m90B3e8gRcxW0Ve_TufdGISd4AJB7VWX7Hh1DAtEZ4jrEI2KG_yNCGU1.nIFfC pXj8JMtZZvW_0NyKs.TjhC1x0lJzt765LVOd0bw4Qisv8Lii3zpkS.cKyeTD1itFN1o78621BsU2 wtglXszYnLhj6iVLQ4XI0mvOaIVDfBQ7Ft6OJ444mgUxGNnfVj_eUFlTaFZEdNpojL9dpMvSGvqf sDkT_NjyHN9iR74CpjxKTnww0Bnp05xvzWO96aHQ2On7Va3VMzgYdCSl_XoshLoGxBQC0WnKV1pQ Qv.rcUt6j1J.GmzeX59ToQ3eCG.CARN3.pL1ah3kiTtL1igIfcw5IJjwvUH9RNBTO7deRBSZfJYQ u5dy_RB497pKKmheCTicWPMTjsmmLbSLe9x7y.D9AmKbIMvP6QGRAIPCjGZ1r5L5ryTJ_d.pH771 CWYmJ5VW6DoWtMrB8brH27cDNfMKsGaWUvTj1CAeBMdiGSYOUVmVn7LnU55X82K2hCDSXqgsRHrf ncQF5fMGnC3DR0CS.cvs.oYNJ.m55KVFjSogV9N92OcvePIzf72pdsa5BhdydzO04ZjPK_q9x7wg FeDgtsg8DA2XDV_917R5DqWbFoemOKjbQwtshoh79vtWWS3tWC2WBaaMmgkqPX9Ofd2_hURXavx8 sqxQCG.SzvdWnnIOaS8r9A0yJLEmPJAQmNpv7IGPaXwOvAy4Uer6A00MuoC1Ag2iBT3dSCkDy3uJ tqEP8uQcPUBWWFC85QxJlRSul3RLuboWJSwDxIqWQftgaHX14IIMFeABYbM1xGS1vWmSpH91jMDG qaJ47KnzhhGuNuEwE5a7FRGF94kdCLGXga1mYnTv2VBZebyvTdYFMuHAb8AgEPab_xeTSlmyLJzh 2X6V1GV2rmzpEI7p1Hws4ZUh5MP2J.zfj6xfrYYrD3BlgzsGr_kCo_ZgTZArGogIDC_d4a.Uk7Sa 6Yp.uR_71_ymYr1bq.c.adKTHKlIqkZOVu_OtvUYb8tEdK9EvALbHN5OP0DX5bPcJQJGQJTN7epG p4T2A5J0_zen38ZcXIm6BBkpV9rAFEBuYu5PIAc3YQ.KmDa24yWW8TB.2Clvjc4UH4QKCuWVh82F Wnp7umpNBiEwFN5.jDenY8sCoITLxQwYd.ceicPMr2fkfRiMiXQgiCaphpApJaZKv726Ahl49eKl f8qBQ7bpJgzIKNe2lMcxaX8b63Dcbh211FYD4y6y.Tx.6uMOR7VTmFuBCDYHVK5I3Wm6kkts76cZ DVf30EJnQIjLLcXGB74KBaefW2WVrT6nvp8Ac671dUwuGZ4Kuogq6wA8XNSWj.7taSp6aZPgrUvj R8WWFGaPYvPr2KZu2.5lGDjMStlDVVt3BnaBLWkoM98PMqiwHcMIjf_fxPmOO_WfJOIb9q8EGaAu K.vE5hIS8U3msWAec.c2mkjoC7k10OGaA911sQiFjNwmHkbLIx5yuE1AGpWn3PiS15au.Wf.XU5G mC_IF8YMFpt6QiaJUzLEc5EVkNZN.pKNQsdbd74.GoL9CgQr14erddwuUE9GXBE9TdhZfX1Ok_hQ UE1lwqkoiVhSfVCZx3zVpH93mNUl6O48ZUv3HSCuDL8GCF1KEDphwZKMrDTldKn2_z9g66SM7csa x X-Sonic-MF: X-Sonic-ID: f05c7ab8-387f-470d-aa26-958aa86912b7 Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.gq1.yahoo.com with HTTP; Thu, 24 Aug 2023 01:14:14 +0000 Received: by hermes--production-ne1-7b767b77cc-swgdr (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID edd85d3e5d7a59c7c1950f329de4c31b; Thu, 24 Aug 2023 01:14:12 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Subject: Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder From: Mark Millard In-Reply-To: Date: Wed, 23 Aug 2023 18:14:00 -0700 Cc: Current FreeBSD Content-Transfer-Encoding: quoted-printable Message-Id: References: <4FFAE432-21FE-4462-9162-9CC30A5D470A.ref@yahoo.com> <4FFAE432-21FE-4462-9162-9CC30A5D470A@yahoo.com> <5D23E6BE-A25C-4190-BB2C-A2D3511ABD90@yahoo.com> To: Mateusz Guzik X-Mailer: Apple Mail (2.3731.700.6) X-Rspamd-Queue-Id: 4RWQ8n2dQWz3bTx X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] On Aug 23, 2023, at 15:10, Mateusz Guzik wrote: > On 8/23/23, Mark Millard wrote: >> [Forked off the ZFS deadlock 14 discussion, per feedback.] >>=20 >> On Aug 23, 2023, at 11:40, Alexander Motin wrote: >>=20 >>> On 22.08.2023 14:24, Mark Millard wrote: >>>> Alexander Motin wrote on >>>> Date: Tue, 22 Aug 2023 16:18:12 UTC : >>>>> I am waiting for final test results from George Wilson and then = will >>>>> request quick merge of both to zfs-2.2-release branch. = Unfortunately >>>>> there are still not many reviewers for the PR, since the code is = not >>>>> trivial, but at least with the test reports Brian Behlendorf and = Mark >>>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody = else >>>>> have tested and/or reviewed the PR, you may comment on it. >>>> I had written to the list that when I tried to test the system >>>> doing poudriere builds (initially with your patches) using >>>> USE_TMPFS=3Dno so that zfs had to deal with all the file I/O, I >>>> instead got only one builder that ended up active, the others >>>> never reaching "Builder started": >>>=20 >>>> Top was showing lots of "vlruwk" for the cpdup's. For example: >>>> . . . >>>> 362 0 root 40 0 27076Ki 13776Ki CPU19 19 4:23 >>>> 0.00% cpdup -i0 -o ref 32 >>>> 349 0 root 53 0 27076Ki 13776Ki vlruwk 22 4:20 >>>> 0.01% cpdup -i0 -o ref 31 >>>> 328 0 root 68 0 27076Ki 13804Ki vlruwk 8 4:30 >>>> 0.01% cpdup -i0 -o ref 30 >>>> 304 0 root 37 0 27076Ki 13792Ki vlruwk 6 4:18 >>>> 0.01% cpdup -i0 -o ref 29 >>>> 282 0 root 42 0 33220Ki 13956Ki vlruwk 8 4:33 >>>> 0.01% cpdup -i0 -o ref 28 >>>> 242 0 root 56 0 27076Ki 13796Ki vlruwk 4 4:28 >>>> 0.00% cpdup -i0 -o ref 27 >>>> . . . >>>> But those processes did show CPU?? on occasion, as well as >>>> *vnode less often. None of the cpdup's was stuck in >>>> Removing your patches did not change the behavior. >>>=20 >>> Mark, to me "vlruwk" looks like a limit on number of vnodes. I was = not >>> deep in that area at least recently, so somebody with more = experience >>> there could try to diagnose it. At very least it does not look = related to >>> the ZIL issue discussed in this thread, at least with the = information >>> provided, so I am not surprised that the mentioned patches do not = affect >>> it. >>=20 >> I did the above intending to test the deadlock in my context but >> ended up not getting that far when I tried to make zfs handle all >> the file I/O (USE_TMPFS=3Dno and no other use of tmpfs or the like). >>=20 >> The zfs context is a simple single partition on the boot media. I >> use ZFS for bectl BE use, not for other typical reasons. The media >> here is PCIe Optane 1.4T media. The machine is a ThreadRipper >> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of >> swap, also on that Optane. >>=20 >> # uname -apKU >> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 >> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 >> = root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.a= md64/sys/GENERIC-NODBG >> amd64 amd64 1400096 1400096 >>=20 >> The GENERIC-DBG variant of the kernel did not report any issues in >> earlier testing. >>=20 >> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was >> installed from the same build. >>=20 >> # zfs list >> NAME USED AVAIL REFER >> MOUNTPOINT >> zoptb 79.9G 765G 96K = /zoptb >> zoptb/BUILDs 20.5G 765G 8.29M >> /usr/obj/BUILDs >> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt 1.86M 765G 1.86M >> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt >> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt 30.2M 765G 30.2M >> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt >> zoptb/BUILDs/main-amd64-dbg-clang 9.96G 765G 9.96G >> /usr/obj/BUILDs/main-amd64-dbg-clang >> zoptb/BUILDs/main-amd64-dbg-gccxtc 38.5M 765G 38.5M >> /usr/obj/BUILDs/main-amd64-dbg-gccxtc >> zoptb/BUILDs/main-amd64-nodbg-clang 10.3G 765G 10.3G >> /usr/obj/BUILDs/main-amd64-nodbg-clang >> zoptb/BUILDs/main-amd64-nodbg-clang-alt 37.2M 765G 37.2M >> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt >> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M 765G 94.6M >> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc >> zoptb/DESTDIRs 4.33G 765G 104K >> /usr/obj/DESTDIRs >> zoptb/DESTDIRs/main-amd64-poud 2.16G 765G 2.16G >> /usr/obj/DESTDIRs/main-amd64-poud >> zoptb/DESTDIRs/main-amd64-poud-bulk_a 2.16G 765G 2.16G >> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a >> zoptb/ROOT 13.1G 765G 96K = none >> zoptb/ROOT/build_area_for-main-amd64 5.03G 765G 3.24G = none >> zoptb/ROOT/main-amd64 8.04G 765G 3.23G = none >> zoptb/poudriere 6.58G 765G 112K >> /usr/local/poudriere >> zoptb/poudriere/data 6.58G 765G 128K >> /usr/local/poudriere/data >> zoptb/poudriere/data/.m 112K 765G 112K >> /usr/local/poudriere/data/.m >> zoptb/poudriere/data/cache 17.4M 765G 17.4M >> /usr/local/poudriere/data/cache >> zoptb/poudriere/data/images 96K 765G 96K >> /usr/local/poudriere/data/images >> zoptb/poudriere/data/logs 2.72G 765G 2.72G >> /usr/local/poudriere/data/logs >> zoptb/poudriere/data/packages 3.84G 765G 3.84G >> /usr/local/poudriere/data/packages >> zoptb/poudriere/data/wrkdirs 112K 765G 112K >> /usr/local/poudriere/data/wrkdirs >> zoptb/poudriere/jails 96K 765G 96K >> /usr/local/poudriere/jails >> zoptb/poudriere/ports 96K 765G 96K >> /usr/local/poudriere/ports >> zoptb/tmp 68.5M 765G 68.5M = /tmp >> zoptb/usr 35.1G 765G 96K = /usr >> zoptb/usr/13_0R-src 2.64G 765G 2.64G >> /usr/13_0R-src >> zoptb/usr/alt-main-src 96K 765G 96K >> /usr/alt-main-src >> zoptb/usr/home 181M 765G 181M = /usr/home >> zoptb/usr/local 5.08G 765G 5.08G >> /usr/local >> zoptb/usr/main-src 833M 765G 833M >> /usr/main-src >> zoptb/usr/ports 26.4G 765G 26.4G >> /usr/ports >> zoptb/usr/src 96K 765G 96K = /usr/src >> zoptb/var 52.6M 765G 96K = /var >> zoptb/var/audit 356K 765G 356K >> /var/audit >> zoptb/var/crash 128K 765G 128K >> /var/crash >> zoptb/var/db 49.7M 765G 96K = /var/db >> zoptb/var/db/pkg 49.4M 765G 49.4M >> /var/db/pkg >> zoptb/var/db/ports 164K 765G 164K >> /var/db/ports >> zoptb/var/log 1.61M 765G 1.61M = /var/log >> zoptb/var/mail 632K 765G 632K = /var/mail >> zoptb/var/tmp 128K 765G 128K = /var/tmp >>=20 >> # poudriere jail -jmain-amd64-bulk_a -i >> Jail name: main-amd64-bulk_a >> Jail version: 14.0-ALPHA2 >> Jail arch: amd64 >> Jail method: null >> Jail mount: /usr/obj/DESTDIRs/main-amd64-poud-bulk_a >> Jail fs: >> Jail updated: 2021-12-04 14:55:22 >> Jail pkgbase: disabled >>=20 >>=20 >>=20 >> So, setting up another test with some related information >> shown before, during, and after. sysctl output is from >> another ssh session than the bulk -a run. >>=20 >> # sysctl -a | grep vnode >> kern.maxvnodes: 2213808 >> kern.ipc.umtx_vnode_persistent: 0 >> kern.minvnodes: 553452 >> vm.vnode_pbufs: 2048 >> vm.stats.vm.v_vnodepgsout: 0 >> vm.stats.vm.v_vnodepgsin: 272429 >> vm.stats.vm.v_vnodeout: 0 >> vm.stats.vm.v_vnodein: 12461 >> vfs.vnode_alloc_sleeps: 0 >> vfs.wantfreevnodes: 553452 >> vfs.freevnodes: 962766 >> vfs.vnodes_created: 2538980 >> vfs.numvnodes: 1056233 >> vfs.cache.debug.vnodes_cel_3_failures: 0 >> vfs.cache.stats.heldvnodes: 91878 >> debug.vnode_domainset: >> debug.sizeof.vnode: 448 >> debug.fail_point.status_fill_kinfo_vnode__random_path: off >> debug.fail_point.fill_kinfo_vnode__random_path: off >>=20 >> # poudriere bulk -jmain-amd64-bulk_a -a >> . . . >> [00:01:34] Building 34042 packages using up to 32 builders >> [00:01:34] Hit CTRL+t at any time to see build progress and stats >> [00:01:34] [01] [00:00:00] Builder starting >> [00:01:57] [01] [00:00:23] Builder started >> [00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4 >> [00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: = Success >> [00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1 >> [00:03:22] [02] [00:00:00] Builder starting >> [00:03:22] [03] [00:00:00] Builder starting >> . . . >> [00:03:22] [31] [00:00:00] Builder starting >> [00:03:22] [32] [00:00:00] Builder starting >> [00:03:31] [01] [00:00:09] Finished print/indexinfo | = indexinfo-0.3.1: >> Success >> [00:03:31] [01] [00:00:00] Building devel/gettext-runtime | >> gettext-runtime-0.22 >> . . . >>=20 >> Note that only [01] makes progress: no new "Builder started" >> notices occur. top shows 31 of the pattern: >> cpdup -i0 -o ref ?? >>=20 >> Then during the 31 cpudup's showing vlruwk most of the time: >>=20 >> # sysctl -a | grep vnode >> kern.maxvnodes: 2213808 >> kern.ipc.umtx_vnode_persistent: 0 >> kern.minvnodes: 553452 >> vm.vnode_pbufs: 2048 >> vm.stats.vm.v_vnodepgsout: 22844 >> vm.stats.vm.v_vnodepgsin: 582398 >> vm.stats.vm.v_vnodeout: 890 >> vm.stats.vm.v_vnodein: 34296 >> vfs.vnode_alloc_sleeps: 2994 >> vfs.wantfreevnodes: 553452 >> vfs.freevnodes: 2209662 >> vfs.vnodes_created: 12206299 >> vfs.numvnodes: 2214071 >> vfs.cache.debug.vnodes_cel_3_failures: 0 >> vfs.cache.stats.heldvnodes: 459 >> debug.vnode_domainset: >> debug.sizeof.vnode: 448 >> debug.fail_point.status_fill_kinfo_vnode__random_path: off >> debug.fail_point.fill_kinfo_vnode__random_path: off >>=20 >> Wait a while but still the mostly cpdup vlruwk status: >>=20 >> # sysctl -a | grep vnode >> kern.maxvnodes: 2213808 >> kern.ipc.umtx_vnode_persistent: 0 >> kern.minvnodes: 553452 >> vm.vnode_pbufs: 2048 >> vm.stats.vm.v_vnodepgsout: 22844 >> vm.stats.vm.v_vnodepgsin: 583527 >> vm.stats.vm.v_vnodeout: 890 >> vm.stats.vm.v_vnodein: 34396 >> vfs.vnode_alloc_sleeps: 8053 >> vfs.wantfreevnodes: 553452 >> vfs.freevnodes: 2210166 >> vfs.vnodes_created: 12212061 >> vfs.numvnodes: 2215106 >> vfs.cache.debug.vnodes_cel_3_failures: 0 >> vfs.cache.stats.heldvnodes: 497 >> debug.vnode_domainset: >> debug.sizeof.vnode: 448 >> debug.fail_point.status_fill_kinfo_vnode__random_path: off >> debug.fail_point.fill_kinfo_vnode__random_path: off >>=20 >> ^C[00:14:55] Error: Signal SIGINT caught, cleaning up and exiting >>=20 >> # sysctl -a | grep vnode >> kern.maxvnodes: 2213808 >> kern.ipc.umtx_vnode_persistent: 0 >> kern.minvnodes: 553452 >> vm.vnode_pbufs: 2048 >> vm.stats.vm.v_vnodepgsout: 22844 >> vm.stats.vm.v_vnodepgsin: 584474 >> vm.stats.vm.v_vnodeout: 890 >> vm.stats.vm.v_vnodein: 34591 >> vfs.vnode_alloc_sleeps: 17584 >> vfs.wantfreevnodes: 553452 >> vfs.freevnodes: 2210796 >> vfs.vnodes_created: 12222343 >> vfs.numvnodes: 2216564 >> vfs.cache.debug.vnodes_cel_3_failures: 0 >> vfs.cache.stats.heldvnodes: 539 >> debug.vnode_domainset: >> debug.sizeof.vnode: 448 >> debug.fail_point.status_fill_kinfo_vnode__random_path: off >> debug.fail_point.fill_kinfo_vnode__random_path: off >>=20 >> [main-amd64-bulk_a-default] [2023-08-23_13h58m08s] [sigint:] Queued: = 34435 >> Built: 2 Failed: 0 Skipped: 35 Ignored: 358 Fetched: 0 >> Tobuild: 34040 Time: 00:14:36 >> [00:16:13] Logs: >> = /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-08-23_1= 3h58m08s >> [00:16:49] Cleaning up >> load: 5.28 cmd: sh 77057 [vlruwk] 141.63r 0.00u 30.98s 28% 6932k >> #0 0xffffffff80b76ebb at mi_switch+0xbb >> #1 0xffffffff80bc960f at sleepq_timedwait+0x2f >> #2 0xffffffff80b76610 at _sleep+0x1d0 >> #3 0xffffffff80c5b2dc at vn_alloc_hard+0x2ac >> #4 0xffffffff80c50a12 at getnewvnode_reserve+0x92 >> #5 0xffffffff829afb12 at zfs_zget+0x22 >> #6 0xffffffff8299ca8d at zfs_dirent_lookup+0x16d >> #7 0xffffffff8299cb5f at zfs_dirlook+0x7f >> #8 0xffffffff829ac410 at zfs_lookup+0x350 >> #9 0xffffffff829a782a at zfs_freebsd_cachedlookup+0x6a >> #10 0xffffffff80c368ad at vfs_cache_lookup+0xad >> #11 0xffffffff80c3b6d8 at cache_fplookup_final_modifying+0x188 >> #12 0xffffffff80c38766 at cache_fplookup+0x356 >> #13 0xffffffff80c43fb2 at namei+0x112 >> #14 0xffffffff80c62e5b at kern_funlinkat+0x13b >> #15 0xffffffff80c62d18 at sys_unlink+0x28 >> #16 0xffffffff83b8e583 at filemon_wrapper_unlink+0x13 >> #17 0xffffffff81049a79 at amd64_syscall+0x109 >>=20 >> [00:26:28] Unmounting file systems >> Exiting with status 1 >>=20 >> # sysctl -a | grep vnode >> kern.maxvnodes: 2213808 >> kern.ipc.umtx_vnode_persistent: 0 >> kern.minvnodes: 553452 >> vm.vnode_pbufs: 2048 >> vm.stats.vm.v_vnodepgsout: 22844 >> vm.stats.vm.v_vnodepgsin: 585384 >> vm.stats.vm.v_vnodeout: 890 >> vm.stats.vm.v_vnodein: 34798 >> vfs.vnode_alloc_sleeps: 27578 >> vfs.wantfreevnodes: 553452 >> vfs.freevnodes: 61362 >> vfs.vnodes_created: 20135479 >> vfs.numvnodes: 59860 >> vfs.cache.debug.vnodes_cel_3_failures: 0 >> vfs.cache.stats.heldvnodes: 208 >> debug.vnode_domainset: >> debug.sizeof.vnode: 448 >> debug.fail_point.status_fill_kinfo_vnode__random_path: off >> debug.fail_point.fill_kinfo_vnode__random_path: off >>=20 >>=20 >> For reference (from after): >>=20 >> # kldstat >> Id Refs Address Size Name >> 1 95 0xffffffff80200000 274b308 kernel >> 2 1 0xffffffff8294c000 5d5238 zfs.ko >> 3 1 0xffffffff82f22000 7718 cryptodev.ko >> 4 1 0xffffffff83b10000 3390 acpi_wmi.ko >> 5 1 0xffffffff83b14000 3220 intpm.ko >> 6 1 0xffffffff83b18000 2178 smbus.ko >> 7 1 0xffffffff83b1b000 2240 cpuctl.ko >> 8 1 0xffffffff83b1e000 3360 uhid.ko >> 9 1 0xffffffff83b22000 4364 ums.ko >> 10 1 0xffffffff83b27000 33c0 usbhid.ko >> 11 1 0xffffffff83b2b000 3380 hidbus.ko >> 12 1 0xffffffff83b2f000 4d20 ng_ubt.ko >> 13 6 0xffffffff83b34000 abb8 netgraph.ko >> 14 2 0xffffffff83b3f000 a250 ng_hci.ko >> 15 4 0xffffffff83b4a000 2670 ng_bluetooth.ko >> 16 1 0xffffffff83b4d000 83a0 uftdi.ko >> 17 1 0xffffffff83b56000 4e58 ucom.ko >> 18 1 0xffffffff83b5b000 3360 wmt.ko >> 19 1 0xffffffff83b5f000 e268 ng_l2cap.ko >> 20 1 0xffffffff83b6e000 1bf68 ng_btsocket.ko >> 21 1 0xffffffff83b8a000 38f8 ng_socket.ko >> 22 1 0xffffffff83b8e000 3250 filemon.ko >> 23 1 0xffffffff83b92000 4758 nullfs.ko >> 24 1 0xffffffff83b97000 73c0 linprocfs.ko >> 25 3 0xffffffff83b9f000 be70 linux_common.ko >> 26 1 0xffffffff83bab000 3558 fdescfs.ko >> 27 1 0xffffffff83baf000 31b20 linux.ko >> 28 1 0xffffffff83be1000 2ed40 linux64.ko >>=20 >>=20 >> Note that before the "Cleaning up" notice, the vfs.freevnodes >> shows as being around (for example) 2210796. But after >> "Exiting with status": 61362. vfs.vnodes_created has a >> similar staging of in the ball park of up to 12222343 but >> then the change to: 20135479. Similarly, vfs.numvnodes >> 2216564 -> 59860. >>=20 >>=20 >>=20 >> Anything else I should gather and report as basic information? >>=20 >=20 > This is a known problem, but it is unclear if you should be running > into it in this setup. The changed fixed the issue: so I do run into the the issue for this setup. See below. > Can you try again but this time *revert* > 138a5dafba312ff39ce0eefdbe34de95519e600d, like so: > git revert 138a5dafba312ff39ce0eefdbe34de95519e600d >=20 > may want to switch to a different branch first, for example: git > checkout -b vfstesting # git -C /usr/main-src/ diff sys/kern/vfs_subr.c diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 0f3f00abfd4a..5dff556ac258 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd) MPASS(curthread->td_pinned > 0); MPASS(vd->index =3D=3D VDBATCH_SIZE); + mtx_lock(&vnode_list_mtx); critical_enter(); - if (mtx_trylock(&vnode_list_mtx)) { - for (i =3D 0; i < VDBATCH_SIZE; i++) { - vp =3D vd->tab[i]; - vd->tab[i] =3D NULL; - TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); - TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); - MPASS(vp->v_dbatchcpu !=3D NOCPU); - vp->v_dbatchcpu =3D NOCPU; - } - mtx_unlock(&vnode_list_mtx); - } else { - for (i =3D 0; i < VDBATCH_SIZE; i++) { - vp =3D vd->tab[i]; - vd->tab[i] =3D NULL; - MPASS(vp->v_dbatchcpu !=3D NOCPU); - vp->v_dbatchcpu =3D NOCPU; - } + for (i =3D 0; i < VDBATCH_SIZE; i++) { + vp =3D vd->tab[i]; + TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); + TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); + MPASS(vp->v_dbatchcpu !=3D NOCPU); + vp->v_dbatchcpu =3D NOCPU; } + mtx_unlock(&vnode_list_mtx); + bzero(vd->tab, sizeof(vd->tab)); vd->index =3D 0; critical_exit(); } Still with: # grep USE_TMPFS=3D /usr/local/etc/poudriere.conf # EXAMPLE: USE_TMPFS=3D"wrkdir data" #USE_TMPFS=3Dall #USE_TMPFS=3D"data" USE_TMPFS=3Dno That allowed the other builders to eventually reach "Builder started" and later activity, [00:05:50] [27] [00:02:29] Builder started being the first non-[01] to do so, no vlruwk's observed in what I saw in top: # poudriere bulk -jmain-amd64-bulk_a -a . . . [00:01:34] Building 34042 packages using up to 32 builders [00:01:34] Hit CTRL+t at any time to see build progress and stats [00:01:34] [01] [00:00:00] Builder starting [00:01:57] [01] [00:00:23] Builder started [00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4 [00:03:08] [01] [00:01:11] Finished ports-mgmt/pkg | pkg-1.20.4: Success [00:03:21] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1 [00:03:21] [02] [00:00:00] Builder starting [00:03:21] [03] [00:00:00] Builder starting [00:03:21] [04] [00:00:00] Builder starting [00:03:21] [05] [00:00:00] Builder starting [00:03:21] [06] [00:00:00] Builder starting [00:03:21] [07] [00:00:00] Builder starting [00:03:21] [08] [00:00:00] Builder starting [00:03:21] [09] [00:00:00] Builder starting [00:03:21] [10] [00:00:00] Builder starting [00:03:21] [11] [00:00:00] Builder starting [00:03:21] [12] [00:00:00] Builder starting [00:03:21] [13] [00:00:00] Builder starting [00:03:21] [14] [00:00:00] Builder starting [00:03:21] [15] [00:00:00] Builder starting [00:03:21] [16] [00:00:00] Builder starting [00:03:21] [17] [00:00:00] Builder starting [00:03:21] [18] [00:00:00] Builder starting [00:03:21] [19] [00:00:00] Builder starting [00:03:21] [20] [00:00:00] Builder starting [00:03:21] [21] [00:00:00] Builder starting [00:03:21] [22] [00:00:00] Builder starting [00:03:21] [23] [00:00:00] Builder starting [00:03:21] [24] [00:00:00] Builder starting [00:03:21] [25] [00:00:00] Builder starting [00:03:21] [26] [00:00:00] Builder starting [00:03:21] [27] [00:00:00] Builder starting [00:03:21] [28] [00:00:00] Builder starting [00:03:21] [29] [00:00:00] Builder starting [00:03:21] [30] [00:00:00] Builder starting [00:03:21] [31] [00:00:00] Builder starting [00:03:21] [32] [00:00:00] Builder starting [00:03:29] [01] [00:00:08] Finished print/indexinfo | indexinfo-0.3.1: = Success [00:03:30] [01] [00:00:00] Building devel/gettext-runtime | = gettext-runtime-0.22 [00:04:49] [01] [00:01:19] Finished devel/gettext-runtime | = gettext-runtime-0.22: Success [00:04:52] [01] [00:00:00] Building devel/gmake | gmake-4.3_2 [00:05:20] [01] [00:00:28] Finished devel/gmake | gmake-4.3_2: Success [00:05:23] [01] [00:00:00] Building devel/libtextstyle | = libtextstyle-0.22 [00:05:50] [27] [00:02:29] Builder started [00:05:50] [27] [00:00:00] Building devel/npth | npth-1.6 [00:05:50] [08] [00:02:29] Builder started [00:05:50] [08] [00:00:00] Building devel/boost-jam | boost-jam-1.82.0 [00:05:51] [16] [00:02:30] Builder started [00:05:51] [16] [00:00:00] Building devel/libdaemon | libdaemon-0.14_1 [00:05:51] [19] [00:02:30] Builder started [00:05:51] [19] [00:00:00] Building graphics/libpotrace | = libpotrace-1.16 [00:05:51] [14] [00:02:30] Builder started [00:05:51] [14] [00:00:00] Building sysutils/gnome_subr | gnome_subr-1.0 [00:05:51] [17] [00:02:30] Builder started [00:05:51] [17] [00:00:00] Building misc/hicolor-icon-theme | = hicolor-icon-theme-0.17 [00:05:51] [26] [00:02:30] Builder started [00:05:51] [26] [00:00:00] Building devel/libinotify | = libinotify-20211018 [00:05:51] [12] [00:02:30] Builder started [00:05:51] [12] [00:00:00] Building devel/evdev-proto | evdev-proto-5.8 [00:05:51] [29] [00:02:30] Builder started [00:05:51] [29] [00:00:00] Building devel/opencl | opencl-3.0.14 [00:05:51] [05] [00:02:30] Builder started [00:05:51] [05] [00:00:00] Building devel/autoconf-switch | = autoconf-switch-20220527 [00:05:51] [11] [00:02:30] Builder started [00:05:51] [11] [00:00:00] Building textproc/html2text | = html2text-1.3.2a_1,1 [00:05:51] [25] [00:02:30] Builder started [00:05:51] [04] [00:02:30] Builder started [00:05:51] [25] [00:00:00] Building audio/speexdsp | speexdsp-1.2.1 [00:05:51] [04] [00:00:00] Building converters/libiconv | libiconv-1.17 [00:05:51] [22] [00:02:30] Builder started [00:05:51] [22] [00:00:00] Building dns/public_suffix_list | = public_suffix_list-20230630 [00:05:51] [20] [00:02:30] Builder started [00:05:51] [02] [00:02:30] Builder started [00:05:51] [24] [00:02:30] Builder started [00:05:51] [31] [00:02:30] Builder started [00:05:51] [20] [00:00:00] Building print/gsfonts | gsfonts-8.11_8 . . . I had waited a very long time with no such progress for [02]..[32] for the prior code. Now testing for the zfs deadlock issue should be possible for this setup. =3D=3D=3D Mark Millard marklmi at yahoo.com