From nobody Thu Dec 14 06:58:31 2023 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SrNVJ4c8pz53XZQ for ; Thu, 14 Dec 2023 06:58:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SrNVH6Hqfz3dPs for ; Thu, 14 Dec 2023 06:58:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1702537111; a=rsa-sha256; cv=none; b=py5Egsf69v9C13otVW4t6PdGk+ccJqaMLO+D2QVLRet8ArOrCtuJt3acKClIo5btVOCGZA RSRAyzKs38okdAFHl50CkT4OvMWmVebEyBIttfjA/hK20bAgjD+RsYeaeGXrkxA480rY30 u02tgXnLUIo6U8I9/gmEzHdPxcmzab+vSsJsMGAJtf1+69jp9bxSdUnCvkcjoABtL9UV5L CcYR1uWbwqZ2vJj5/F430hnoAVJR1+oj0TZ7eDose8lvKGzqpJdkW73HnWPsMl9fgZcVRI DhmIYl9EC4pXMUJer3mEF3BsfF+rEsLicCK4KwV1UTW5sPvmGRDROS6QIgWzsg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1702537111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0BBZky2p2aFM6LUxv64hM/27lc/sXZOfbW1sq8t6yVY=; b=oCOQlhfi1HPsS/FshKeQp06KYPcFjepN8eJOpEQ079AcGzJpi0lYDwSYN3aNFODfnMiel4 /OV7W7HSsD7/ZGxzp8pZvmhnH2ce8xiAiRv3Dl402ut3GPDHCtJVDi0SFCx+PTeBQdO/on EvQuEKjjnKA8eKirRtqakan08GNUOvvslMBogjj7HgwQL05iV942L1YYVS4W2TDMp0e8Fx f+oxjs1ylyuO6MpYXQX4EzV2MOvB/EUjtwpfuSmDR9YfFDWlCTANcu8vIPjyXTvTWLBtzw iIMIPb62hbVuQFZG7gbq985uzOhsOtYsHuBR7Yep9shVc7XT8uOXtfl8VSc8ug== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4SrNVH56Slz2XM for ; Thu, 14 Dec 2023 06:58:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 3BE6wVxA013925 for ; Thu, 14 Dec 2023 06:58:31 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 3BE6wVJP013924 for fs@FreeBSD.org; Thu, 14 Dec 2023 06:58:31 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 275594] High CPU usage by arc_prune; analysis and fix Date: Thu, 14 Dec 2023 06:58:31 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 14.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: seigo.tanimura@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594 --- Comment #12 from Seigo Tanimura --- (In reply to Seigo Tanimura from comment #10) I have added the fix to enable the extra vnode recycling and tested with the same setup. Source on GitHub: - Repo: https://github.com/altimeter-130ft/freebsd-freebsd-src - Branches - Fix: release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-interval-= fix - Counters atop Fix: release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-interval-counters Test setup: The same as "Ongoing test" in bug #275594, comment #6. - vfs.vnode.vnlru.max_free_per_call: 4000000 (=3D=3D vfs.vnode.vnlru.max_free_per_call) - vfs.zfs.arc.prune_interval: 1000 (my fix for arc_prune interval enabled) - vfs.vnode.vnlru.extra_recycle: 1 (extra vnode recycle fix enabled) Build time: 06:50:05 (312 pkgs / hr) Counters after completing the build, with some remarks: # The iteration attempts in vnlru_free_impl(). # This includes the retry from the head of vnode_list. vfs.vnode.free.free_attempt: 33934506866 # The number of the vnodes recycled successfully, including vtryrecycle(). vfs.vnode.free.free_success: 42945537 # The number of the successful recycles in phase 2 upon the VREG (regular f= ile) vnodes. # - cleanbuf_vmpage_only: the vnodes held by the clean bufs and resident VM pages only. # - cleanbuf_only: the vnodes held by the clean bufs only. vfs.vnode.free.free_phase2_retry_reg_cleanbuf_vmpage_only: 845659 vfs.vnode.free.free_phase2_retry_reg_cleanbuf_only: 3 # The number of the iteration skips due to a held vnode. ("phase 2" hereaft= er) # NB the successful recycles in phase 2 are not included. vfs.vnode.free.free_phase2_retry: 8923850577 # The number of the phase 2 skips upon the VREG vnodes. vfs.vnode.free.free_phase2_retry_reg: 8085735334 # The number of the phase 2 skips upon the VREG vnodes in use. # Almost all phase 2 skips upon VREG fell into this. vfs.vnode.free.free_phase2_retry_reg_inuse: 8085733060 # The number of the successful recycles in phase 2 upon the VDIR (directory) vnodes. # - free_phase2_retry_dir_nc_src_only: the vnodes held by the namecache ent= ries only. vfs.vnode.free.free_phase2_retry_dir_nc_src_only: 2234194 # The number of the phase 2 skips upon the VDIR vnodes. vfs.vnode.free.free_phase2_retry_dir: 834902819 # The number of the phase 2 skips upon the VDIR vnodes in use. # Almost all phase 2 skips upon VDIR fell into this. vfs.vnode.free.free_phase2_retry_dir_inuse: 834902780 Other findings: - The behaviour upon the arc_prune thread CPU usage was mostly the same. - The peak reduced just a few percents, not likely to be the essential fi= x. - The namecache hit ratio degraded about 10 - 20%. - Maybe the recycled vnodes are looked up again, especially the directori= es. ----- The issue still exists essentially with the extra vnode recycle. Maybe the root cause is in ZFS rather than the OS. There are some suspicious findings on the in-memory dnode behaviour during = the tests so far: - vfs.zfs.arc_max does not enforce the max size of kstat.zfs.misc.arcstats.dnode_size. - vfs.zfs.arc_max: 4GB - vfs.zfs.arc.dnode_limit_percent: 10 (default) - sizeof(struct dnode_t): 808 bytes - Found by "vmstat -z | grep dnode_t". - kstat.zfs.misc.arcstats.arc_dnode_limit: 400MB (default, vfs.zfs.arc.dnode_limit_percent percent of vfs.zfs.arc_max) - ~495K dnodes. - kstat.zfs.misc.arcstats.dnode_size, max: ~ 1.8GB - ~2.2M dnodes. - Almost equal to the max observed number of the vnodes. - The dnode_t zone of uma(9) does not have the limit. >From above, the number of the in-memory dnodes looks like the bottleneck.=20 Maybe the essential solution is to configure vfs.zfs.arc.dnode_limit explic= itly so that ZFS can hold all dnodes required by the application in the memory. --=20 You are receiving this mail because: You are the assignee for the bug.=