From nobody Thu Apr 21 16:38:35 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1B44D11D9D50 for ; Thu, 21 Apr 2022 16:38:44 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from mail2.ambrisko.com (mail2.ambrisko.com [70.91.206.91]) by mx1.freebsd.org (Postfix) with ESMTP id 4Kkjsb07Ysz3qxc for ; Thu, 21 Apr 2022 16:38:42 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) IronPort-SDR: Oxb0QkHfuzd96W7nqZa3jTRObFgbe5keEkLQlTk3utBhms6OaQYqqrM9gCOLoQflMpP8pAiUNb e9viX4i496+i6IpRpqKe5NE8WwHPknhK8= X-Ambrisko-Me: Yes IronPort-Data: A9a23:6trXMq/jLF+Wx+4uPXHRDrUDlH6TJUtcMsCJ2f8bNWPcYEJGY0x3m GFKC2GPOv+LZmDwfNlyOYqw8R4C7MTWmtNlQANo+HhgHilAwSbn6XR1DatR0we6dJCroHqKY 6zyU/GYRCwOZia0SiqFadANk1EtjMlkeZKsUIYoCggpLeNVYH9JZSBLwobVsaY06TSNOD5hj PupyyHp1P9J7BYvWo4cw/rrRBqCJ50eshtA1rA1TagjUFMzCxD5pX/CTJxdIUcUQqEMdgK7b +fF0Lyj+GrduR4oAMmkibX8NEYNR9Y+PyDX2yAQAvbyxEEE/ETe0Y5jXBYYQU5SgS+IhNN24 NxIv4axUgQueKbLnYzxVjEFSnshZvcuFLjvZCLXXdao52TCfmvlxfljFmkyMIwU++B4DHsI8 /EEQBgIbB+eleO16L2+Q+howM8kKaHDMpkSt3t7wXTSEOw8TJbfa6vQ6NJSxzt2gdpBdcsyz eJxhSFHdxnafRBVYBEeDZgknfyrgT/0dDgwlb5cnoJvi0C78eC7+OGF3AP9doPYSMNLsFyfo 26arW31DgtAbY6WzDCf82mvgcfGmCnhWZkRE/uz8fsz2A+fwWkaCRs3U1qnoKnk0hfvB4oHc 0FEqDAzqaUS9VCwSoWvVROPv3PZ7AUXXMBdErNm5VjVmLbU+QuQGkMNUiVFNI49rMYzSDFzj g2JktrlCCZBqrqQTX7BpL6YoSnoYHocKGUYZDQHSiMM5tP5oZowiVTESdM6SPy5idj8GDfRx TGWrXhj3+xC0ZZTj6jipALJmTOhoJTNXzUZ3ASPUzL39B59aa6ke5estQrR48FfIdvLVVKGp nUFxZSTtbhcEZGXmSWRa+wRB7X1te2dOTjRjFMzTZks8zOhpyyqcYxKumgsJUF1P9wCcDuva UrZowJK55gVN3yvNPclb4W0AsUs7K7hCdW1C6iNP4YWOsB8JF2d4SVjRU+MxGS8wkEjnJY2N YqfbcvxX20RDr5qzWbuSupBg6UnwDsymTHaSZzhlUz1yreEenOPE/EMNVGUb/s66+WPpwCMq 4RTMM6DyhN+VuziY3mKqddCcQhSdXVrV4rrr8F3d/KYJls0EW4sPPbd3Lc9dtE3hK9SjOrJo imwV0IwJIATXpEbxdFmskxeVY4= IronPort-HdrOrdr: A9a23:Vh73U6lt1J0urNRgXvqQmAQVYePpDfIs3DAbv31ZSRFFG/FwWf rOoB0+726StN9xYgBFpTnuAsW9qB/nmqKdpLNhW4tKPzOW3VdATrsSjrcKqgeIc0aSygce79 YDT0EUMr3N5DZB4/oT0GODeeod/A== Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport2.ambrisko.com with ESMTP; 21 Apr 2022 08:35:07 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.17.1/8.17.1) with ESMTPS id 23LGcZ8o074385 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 21 Apr 2022 09:38:35 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) X-Authentication-Warning: internal.ambrisko.com: Host localhost [127.0.0.1] claimed to be ambrisko.com Received: (from ambrisko@localhost) by ambrisko.com (8.17.1/8.17.1/Submit) id 23LGcZ7o074384; Thu, 21 Apr 2022 09:38:35 -0700 (PDT) (envelope-from ambrisko) Date: Thu, 21 Apr 2022 09:38:35 -0700 From: Doug Ambrisko To: Alexander Leidinger Cc: Mateusz Guzik , freebsd-current@freebsd.org Subject: Re: nullfs and ZFS issues Message-ID: References: <20220420113944.Horde.5qBL80-ikDLIWDIFVJ4VgzX@webmail.leidinger.net> <20220421083310.Horde.r7YT8777_AvGU_6GO1cC90G@webmail.leidinger.net> <20220421154402.Horde.I6m2Om_fxqMtDMUqpiZAxtP@webmail.leidinger.net> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="HAyyfbF3IpFncfdR" Content-Disposition: inline In-Reply-To: <20220421154402.Horde.I6m2Om_fxqMtDMUqpiZAxtP@webmail.leidinger.net> X-Rspamd-Queue-Id: 4Kkjsb07Ysz3qxc X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of ambrisko@ambrisko.com has no SPF policy when checking 70.91.206.91) smtp.mailfrom=ambrisko@ambrisko.com X-Spamd-Result: default: False [-1.98 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.982]; FREEFALL_USER(0.00)[ambrisko]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/mixed,text/plain,text/x-diff]; HAS_XAW(0.00)[]; DMARC_NA(0.00)[ambrisko.com]; HAS_ATTACHMENT(0.00)[]; AUTH_NA(1.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MLMMJ_DEST(0.00)[freebsd-current]; R_SPF_NA(0.00)[no SPF record]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:+]; ASN(0.00)[asn:7922, ipnet:70.88.0.0/14, country:US]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; MID_RHS_MATCH_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --HAyyfbF3IpFncfdR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Apr 21, 2022 at 03:44:02PM +0200, Alexander Leidinger wrote: | Quoting Mateusz Guzik (from Thu, 21 Apr 2022 | 14:50:42 +0200): | | > On 4/21/22, Alexander Leidinger wrote: | >> I tried nocache on a system with a lot of jails which use nullfs, | >> which showed very slow behavior in the daily periodic runs (12h runs | >> in the night after boot, 24h or more in subsequent nights). Now the | >> first nightly run after boot was finished after 4h. | >> | >> What is the benefit of not disabling the cache in nullfs? I would | >> expect zfs (or ufs) to cache the (meta)data anyway. | >> | > | > does the poor performance show up with | > https://people.freebsd.org/~mjg/vnlru_free_pick.diff ? | | I would like to have all the 22 jails run the periodic scripts a | second night in a row before trying this. | | > if the long runs are still there, can you get some profiling from it? | > sysctl -a before and after would be a start. | > | > My guess is that you are the vnode limit and bumping into the 1 second sleep. | | That would explain the behavior I see since I added the last jail | which seems to have crossed a threshold which triggers the slow | behavior. | | Current status (with the 112 nullfs mounts with nocache): | kern.maxvnodes: 10485760 | kern.numvnodes: 3791064 | kern.freevnodes: 3613694 | kern.cache.stats.heldvnodes: 151707 | kern.vnodes_created: 260288639 | | The maxvnodes value is already increased by 10 times compared to the | default value on this system. I've attached mount.patch that when doing mount -v should show the vnode usage per filesystem. Note that the problem I was running into was after some operations arc_prune and arc_evict would consume 100% of 2 cores and make ZFS really slow. If you are not running into that issue then nocache etc. shouldn't be needed. On my laptop I set ARC to 1G since I don't use swap and in the past ARC would consume to much memory and things would die. When the nullfs holds a bunch of vnodes then ZFS couldn't release them. FYI, on my laptop with nocache and limited vnodes I haven't run into this problem. I haven't tried the patch to let ZFS free it's and nullfs vnodes on my laptop. I have only tried it via bhyve test. I use bhyve and a md drive to avoid wearing out my SSD and it's faster to test. I have found the git, tar, make world etc. could trigger the issue before but haven't had any issues with nocache and capping vnodes. Thanks, Doug A. --HAyyfbF3IpFncfdR Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="mount.patch" diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c index 79d9d6cb0ca..00eefb3a5e0 100644 --- a/sbin/mount/mount.c +++ b/sbin/mount/mount.c @@ -692,6 +692,13 @@ prmount(struct statfs *sfp) xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidbuf); free(fsidbuf); } + if (sfp->f_nvnodelistsize != 0 || sfp->f_lazyvnodelistsize != 0) { + xo_open_container("vnodes"); + xo_emit("{D:, }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:lazy}{:lazy/%ju}", + (uintmax_t)sfp->f_nvnodelistsize, + (uintmax_t)sfp->f_lazyvnodelistsize); + xo_close_container("vnodes"); + } } xo_emit("{D:)}\n"); } diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c index a495ad86ac4..3648ef8d080 100644 --- a/sys/kern/vfs_mount.c +++ b/sys/kern/vfs_mount.c @@ -2625,6 +2626,8 @@ __vfs_statfs(struct mount *mp, struct statfs *sbp) sbp->f_version = STATFS_VERSION; sbp->f_namemax = NAME_MAX; sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK; + sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize; + sbp->f_lazyvnodelistsize = mp->mnt_lazyvnodelistsize; return (mp->mnt_op->vfs_statfs(mp, sbp)); } diff --git a/sys/sys/mount.h b/sys/sys/mount.h index 3383bfe8f43..95dd3c76ae5 100644 --- a/sys/sys/mount.h +++ b/sys/sys/mount.h @@ -91,7 +91,9 @@ struct statfs { uint64_t f_asyncwrites; /* count of async writes since mount */ uint64_t f_syncreads; /* count of sync reads since mount */ uint64_t f_asyncreads; /* count of async reads since mount */ - uint64_t f_spare[10]; /* unused spare */ + uint32_t f_nvnodelistsize; /* (i) # of vnodes */ + uint32_t f_lazyvnodelistsize; /* (l) # of lazy vnodes */ + uint64_t f_spare[9]; /* unused spare */ uint32_t f_namemax; /* maximum filename length */ uid_t f_owner; /* user that mounted the filesystem */ fsid_t f_fsid; /* filesystem id */ --HAyyfbF3IpFncfdR--