From nobody Tue Apr 19 09:15:42 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 45C4411E6146 for ; Tue, 19 Apr 2022 09:15:45 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KjJ7N47x5z4kJc for ; Tue, 19 Apr 2022 09:15:44 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-lf1-x12d.google.com with SMTP id b21so28273984lfb.5 for ; Tue, 19 Apr 2022 02:15:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=i0tq+F6csxZWxJ06LYr84hF10GXDLkir4mZGrgYGtP0=; b=h3P/Quzo5vlWhkfsL7xDqQI5yXmEnMMXx/6k3Y1PVNCE2C8CBmbnQRfwnhEcyHReJb VkLqEIzNGbSu9MF/JLhebDYltpISVTGQ9eDZNrI/m0K9ZuwFNvVMd5FpN6l1poySfiyW dAJYmhRvkuluhjHS894bxBjsYJIKjJ+jj9B46G2EFYUO1YRhCeYK/z6bpRMflMm06vha U/JsPrMOqBnN5gBwWiqcW/LKiFl1QMGiHgWHexI2rDm2xTL54ViOtAWDdnP4xRdVVkqt EsGJ6WWb09e5wnJ+h3xcwzsVRnK39j/FqvezYdGWLEEql8Z0eOdLby8WDp+MV+SasCh9 Dl5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=i0tq+F6csxZWxJ06LYr84hF10GXDLkir4mZGrgYGtP0=; b=E5+m/bmts9Sd7tKaX68J3kOsDGZUYMLVIz8FukVXqhu1kJE4qLPMriMNGUgYCG/DZQ YUiOoHnYGAf44cf45bJMjEZbSnzl8m8Vf66xVOzcTSexULjjX6aGcr5IkSXQ4zUwE6wd 6+IeaTamxc/90QRR8KDRfPotJHX88bjenTfT67eSRFj1FNDV3c1Xvpdr4wANZ6VKqZlB uHzrepcTo2TXde3u3Y28SAOJDGD4cTQhg539Mqwi24IEFTuqj6W5Q2/X07jtmnNcOJjz 2uas7Mu+zSMVfhmntLAeD0zLf5KJbPHdNYO6vDVXsR2f8pexXF+hTklCMtZHqa8Ss0jJ znKQ== X-Gm-Message-State: AOAM532497+DCFZM87jQEzFMekF83nvjWUhkfYd3DCxqf2sfFzBw12Ge LgA4RnqckcyAvhQhIiZvvdrTTyY8Y3HcMkYCSjjrwr5v X-Google-Smtp-Source: ABdhPJwzml2ptjbWFseb7ooKFauOS42ph3ngqJW1MoZHmT202Q0E70IRXR+2pvcF6ho1Hhw8g217GSHHVuql9OJRJYA= X-Received: by 2002:a05:6512:3d13:b0:471:9471:5d57 with SMTP id d19-20020a0565123d1300b0047194715d57mr5763238lfv.366.1650359743411; Tue, 19 Apr 2022 02:15:43 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:a05:6520:6145:b0:1bb:7433:4cdd with HTTP; Tue, 19 Apr 2022 02:15:42 -0700 (PDT) In-Reply-To: References: From: Mateusz Guzik Date: Tue, 19 Apr 2022 11:15:42 +0200 Message-ID: Subject: Re: nullfs and ZFS issues To: Doug Ambrisko Cc: freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4KjJ7N47x5z4kJc X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b="h3P/Quzo"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of mjguzik@gmail.com designates 2a00:1450:4864:20::12d as permitted sender) smtp.mailfrom=mjguzik@gmail.com X-Spamd-Result: default: False [-3.78 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.980]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::12d:from]; MLMMJ_DEST(0.00)[freebsd-current]; NEURAL_HAM_SHORT(-0.80)[-0.798]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N On 4/19/22, Mateusz Guzik wrote: > On 4/19/22, Doug Ambrisko wrote: >> I've switched my laptop to use nullfs and ZFS. Previously, I used >> localhost NFS mounts instead of nullfs when nullfs would complain >> that it couldn't mount. Since that check has been removed, I've >> switched to nullfs only. However, every so often my laptop would >> get slow and the the ARC evict and prune thread would consume two >> cores 100% until I rebooted. I had a 1G max. ARC and have increased >> it to 2G now. Looking into this has uncovered some issues: >> - nullfs would prevent vnlru_free_vfsops from doing anything >> when called from ZFS arc_prune_task >> - nullfs would hang onto a bunch of vnodes unless mounted with >> nocache >> - nullfs and nocache would break untar. This has been fixed now. >> >> With nullfs, nocache and settings max vnodes to a low number I can >> keep the ARC around the max. without evict and prune consuming >> 100% of 2 cores. This doesn't seem like the best solution but it >> better then when the ARC starts spinning. >> >> Looking into this issue with bhyve and a md drive for testing I create >> a brand new zpool mounted as /test and then nullfs mount /test to /mnt. >> I loop through untaring the Linux kernel into the nullfs mount, rm -rf it >> and repeat. I set the ARC to the smallest value I can. Untarring the >> Linux kernel was enough to get the ARC evict and prune to spin since >> they couldn't evict/prune anything. >> >> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it >> static int >> vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp) >> { >> ... >> >> for (;;) { >> ... >> vp = TAILQ_NEXT(vp, v_vnodelist); >> ... >> >> /* >> * Don't recycle if our vnode is from different type >> * of mount point. Note that mp is type-safe, the >> * check does not reach unmapped address even if >> * vnode is reclaimed. >> */ >> if (mnt_op != NULL && (mp = vp->v_mount) != NULL && >> mp->mnt_op != mnt_op) { >> continue; >> } >> ... >> >> The vp ends up being the nulfs mount and then hits the continue >> even though the passed in mvp is on ZFS. If I do a hack to >> comment out the continue then I see the ARC, nullfs vnodes and >> ZFS vnodes grow. When the ARC calls arc_prune_task that calls >> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS. >> The ARC cache usage also goes down. Then they increase again until >> the ARC gets full and then they go down again. So with this hack >> I don't need nocache passed to nullfs and I don't need to limit >> the max vnodes. Doing multiple untars in parallel over and over >> doesn't seem to cause any issues for this test. I'm not saying >> commenting out continue is the fix but a simple POC test. >> > > I don't see an easy way to say "this is a nullfs vnode holding onto a > zfs vnode". Perhaps the routine can be extrended with issuing a nullfs > callback, if the module is loaded. > > In the meantime I think a good enough(tm) fix would be to check that > nothing was freed and fallback to good old regular clean up without > filtering by vfsops. This would be very similar to what you are doing > with your hack. > Now that I wrote this perhaps an acceptable hack would be to extend struct mount with a pointer to "lower layer" mount (if any) and patch the vfsops check to also look there. > >> It appears that when ZFS is asking for cached vnodes to be >> free'd nullfs also needs to free some up as well so that >> they are free'd on the VFS level. It seems that vnlru_free_impl >> should allow some of the related nullfs vnodes to be free'd so >> the ZFS ones can be free'd and reduce the size of the ARC. >> >> BTW, I also hacked the kernel and mount to show the vnodes used >> per mount ie. mount -v: >> test on /test (zfs, NFS exported, local, nfsv4acls, fsid >> 2b23b2a1de21ed66, >> vnodes: count 13846 lazy 0) >> /test on /mnt (nullfs, NFS exported, local, nfsv4acls, fsid >> 11ff002929000000, vnodes: count 13846 lazy 0) >> >> Now I can easily see how the vnodes are used without going into ddb. >> On my laptop I have various vnet jails and nullfs mount my homedir into >> them so pretty much everything goes through nullfs to ZFS. I'm limping >> along with the nullfs nocache and small number of vnodes but it would be >> nice to not need that. >> >> Thanks, >> >> Doug A. >> >> > > > -- > Mateusz Guzik > -- Mateusz Guzik