From owner-freebsd-stable@FreeBSD.ORG Tue Jan 1 15:58:14 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8CE97813 for ; Tue, 1 Jan 2013 15:58:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id C14F48FC0A for ; Tue, 1 Jan 2013 15:58:13 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id r01Fw68l094574; Tue, 1 Jan 2013 17:58:06 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.3 kib.kiev.ua r01Fw68l094574 Received: (from kostik@localhost) by tom.home (8.14.5/8.14.5/Submit) id r01Fw6M0094573; Tue, 1 Jan 2013 17:58:06 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 1 Jan 2013 17:58:06 +0200 From: Konstantin Belousov To: Dominic Fandrey Subject: Re: Post 9.1 stable file system problems Message-ID: <20130101155806.GU82219@kib.kiev.ua> References: <50E225DF.3090004@bsdforen.de> <50E23283.8010407@bsdforen.de> <50E23647.6000309@bsdforen.de> <20130101065145.GT82219@kib.kiev.ua> <50E2E720.3040803@bsdforen.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VxJb6WgA6MoA+arP" Content-Disposition: inline In-Reply-To: <50E2E720.3040803@bsdforen.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: FreeBSD , Chris Rees X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jan 2013 15:58:14 -0000 --VxJb6WgA6MoA+arP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote: > On 01/01/2013 07:51, Konstantin Belousov wrote: > > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote: > >> On 01/01/2013 01:49, Dominic Fandrey wrote: > >>> On 01/01/2013 01:29, Chris Rees wrote: > >>>> On 1 Jan 2013 00:01, "Dominic Fandrey" wrote: > >>>>> > >>>>> I have a Tinderbox that I just updated to the current RELENG_9. > >>>>> Following the update build times for packages have increased by a > >>>>> factor between 5 and 20. I.e. I have packages that used to build in > >>>>> 5 minutes and now take an hour. > >>>>> > >>>>> I'm suspecting the file system ever since I saw that the majority o= f CPU > >>>>> load was caused by ls when I looked at top (more than 2 minutes of = CPU > >>>>> time were counted that moment). The majority of the time most of th= e CPU > >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception > >>>>> tools that access a lot of files. > >>>>> > >>>>> The file system on which packages are built is nullfs mounted from > >>>>> an async mounted UFS. I turned async off, to no avail. > >>>>> > >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I > >>>>> think this is where the problem originates. I might hack the tinder= box to > >>>>> use 'ln -s' or set it up for NFS to verify this. > >>>> > >>>> Is your kernel newer than the Jail? The converse causes problems. > >>> > >>> I ran makeJail for all jails after updating. Did you rebuild your modules together with the new kernel ? > >>> > >>> I also seem to have similar problems when building in the host-system. > >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark. > >>> On my notebook it takes less than 10 seconds. > >> > >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system > >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX > >> also pointed to a nullfs mount. > >> > >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without > >> nullfs involvement). The entire make extract took 20s. > >> > >> So still faster by at least factor 30 than running it on a nullfs mount > >> (I eventually SIGINTed so I don't know how long it would've run). > >=20 > > Start providing some useful debugging information ? >=20 > That one might be interesting. It's all system time: >=20 > # time -lh make extract > =3D=3D=3D> License GPLv2 accepted by the user > =3D=3D=3D> Found saved configuration for openjdk-7.9.05_1 > =3D=3D=3D> Extracting for openjdk-7.9.05_2 > =3D> SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip. > =3D> SHA256 Checksum OK for apache-ant-1.8.4-bin.zip. > =3D=3D=3D> openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - fou= nd > ^Ctime: command terminated abnormally > 4m29.30s real 3.03s user 4m22.55s sys > 5008 maximum resident set size > 135 average shared memory size > 2932 average unshared data size > 127 average unshared stack size > 7772 page reclaims > 0 page faults > 0 swaps > 19 block input operations > 101 block output operations > 0 messages sent > 0 messages received > 41 signals received > 1597 voluntary context switches > 16590 involuntary context switches Ok, from your mount -v output, are the three nullfs mounts the only nullfs mount ever used ? Is it only unzip which demostrates the silly behaviour ? Or does it happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount also slow ? Could you try some low-tech profiling on the slow program. For instance, you could run ktrace/kdump -R to see which syscalls are slow. Most darkly part of your report for me, is that I also use nullfs-backed jails both on HEAD and stable/9, with bigger scale, and I do not have an issue. I just did pooma32% time unzip -q /usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-sr= c-b24-09_aug_2012.zip unzip -q 3.25s user 23.77s system 78% cpu 34.482 total over nullfs mount of /usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local). Please try the following patch, which changes nullfs behaviour to be non-cached by default. You could turn on the caching with the 'mount -t nullfs -o cache from to' mounting command. I am interested if use/non-use of -o cache makes a difference for you. diff --git a/sbin/mount_nullfs/mount_nullfs.c b/sbin/mount_nullfs/mount_nul= lfs.c index c88db3d..aaf66e5 100644 --- a/sbin/mount_nullfs/mount_nullfs.c +++ b/sbin/mount_nullfs/mount_nullfs.c @@ -57,27 +57,35 @@ static const char rcsid[] =3D =20 #include "mntopts.h" =20 -static struct mntopt mopts[] =3D { - MOPT_STDOPTS, - MOPT_END -}; - int subdir(const char *, const char *); static void usage(void) __dead2; =20 int main(int argc, char *argv[]) { - struct iovec iov[6]; - int ch, mntflags; + struct iovec *iov; + char *p, *val; char source[MAXPATHLEN]; char target[MAXPATHLEN]; + char errmsg[255]; + int ch, mntflags, iovlen; + char nullfs[] =3D "nullfs"; =20 + iov =3D NULL; + iovlen =3D 0; mntflags =3D 0; + errmsg[0] =3D '\0'; while ((ch =3D getopt(argc, argv, "o:")) !=3D -1) switch(ch) { case 'o': - getmntopts(optarg, mopts, &mntflags, 0); + val =3D strdup(""); + p =3D strchr(optarg, '=3D'); + if (p !=3D NULL) { + free(val); + *p =3D '\0'; + val =3D p + 1; + } + build_iovec(&iov, &iovlen, optarg, val, (size_t)-1); break; case '?': default: @@ -99,21 +107,16 @@ main(int argc, char *argv[]) errx(EX_USAGE, "%s (%s) and %s are not distinct paths", argv[0], target, argv[1]); =20 - iov[0].iov_base =3D strdup("fstype"); - iov[0].iov_len =3D sizeof("fstype"); - iov[1].iov_base =3D strdup("nullfs"); - iov[1].iov_len =3D strlen(iov[1].iov_base) + 1; - iov[2].iov_base =3D strdup("fspath"); - iov[2].iov_len =3D sizeof("fspath"); - iov[3].iov_base =3D source; - iov[3].iov_len =3D strlen(source) + 1; - iov[4].iov_base =3D strdup("target"); - iov[4].iov_len =3D sizeof("target"); - iov[5].iov_base =3D target; - iov[5].iov_len =3D strlen(target) + 1; - - if (nmount(iov, 6, mntflags)) - err(1, NULL); + build_iovec(&iov, &iovlen, "fstype", nullfs, (size_t)-1); + build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1); + build_iovec(&iov, &iovlen, "target", target, (size_t)-1); + build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg)); + if (nmount(iov, iovlen, mntflags) < 0) { + if (errmsg[0] !=3D 0) + err(1, "%s: %s", source, errmsg); + else + err(1, "%s", source); + } exit(0); } =20 diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index 0878e55..4f37020 100644 --- a/sys/fs/nullfs/null.h +++ b/sys/fs/nullfs/null.h @@ -34,9 +34,15 @@ * $FreeBSD$ */ =20 +#ifndef FS_NULL_H +#define FS_NULL_H + +#define NULLM_CACHE 0x0001 + struct null_mount { struct mount *nullm_vfs; struct vnode *nullm_rootvp; /* Reference to root null_node */ + uint64_t nullm_flags; }; =20 #ifdef _KERNEL @@ -80,3 +86,5 @@ MALLOC_DECLARE(M_NULLFSNODE); #endif /* NULLFS_DEBUG */ =20 #endif /* _KERNEL */ + +#endif diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c index b2c7a75..f82d738 100644 --- a/sys/fs/nullfs/null_subr.c +++ b/sys/fs/nullfs/null_subr.c @@ -224,6 +224,9 @@ null_nodeget(mp, lowervp, vpp) * provide ready to use vnode. */ if (VOP_ISLOCKED(lowervp) !=3D LK_EXCLUSIVE) { + KASSERT((MOUNTTONULLMOUNT(mp)->nullm_flags & NULLM_CACHE) =3D=3D 0, + ("lowervp %p is not excl locked and cache is disabled", + lowervp)); vn_lock(lowervp, LK_UPGRADE | LK_RETRY); if ((lowervp->v_iflag & VI_DOOMED) !=3D 0) { vput(lowervp); diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 7d84d51..8a5f1b9 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -67,6 +67,13 @@ static vfs_vget_t nullfs_vget; static vfs_extattrctl_t nullfs_extattrctl; static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp; =20 +/* Mount options that we support. */ +static const char *nullfs_opts[] =3D { + "target", + "cache", + NULL +}; + /* * Mount null layer */ @@ -86,9 +93,11 @@ nullfs_mount(struct mount *mp) =20 if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_NULLFS)) return (EPERM); - if (mp->mnt_flag & MNT_ROOTFS) return (EOPNOTSUPP); + if (vfs_filteropt(mp->mnt_optnew, nullfs_opts)) + return (EINVAL); + /* * Update is a no-op */ @@ -149,7 +158,7 @@ nullfs_mount(struct mount *mp) } =20 xmp =3D (struct null_mount *) malloc(sizeof(struct null_mount), - M_NULLFSMNT, M_WAITOK); + M_NULLFSMNT, M_WAITOK | M_ZERO); =20 /* * Save reference to underlying FS @@ -187,16 +196,25 @@ nullfs_mount(struct mount *mp) mp->mnt_flag |=3D MNT_LOCAL; MNT_IUNLOCK(mp); } + + vfs_flagopt(mp->mnt_optnew, "cache", &xmp->nullm_flags, NULLM_CACHE); + MNT_ILOCK(mp); - mp->mnt_kern_flag |=3D lowerrootvp->v_mount->mnt_kern_flag & - (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED); + if ((xmp->nullm_flags & NULLM_CACHE) !=3D 0) { + mp->mnt_kern_flag |=3D lowerrootvp->v_mount->mnt_kern_flag & + (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | + MNTK_EXTENDED_SHARED); + } mp->mnt_kern_flag |=3D MNTK_LOOKUP_EXCL_DOTDOT; MNT_IUNLOCK(mp); mp->mnt_data =3D xmp; vfs_getnewfsid(mp); - MNT_ILOCK(xmp->nullm_vfs); - TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp, mnt_upper_link); - MNT_IUNLOCK(xmp->nullm_vfs); + if ((xmp->nullm_flags & NULLM_CACHE) !=3D 0) { + MNT_ILOCK(xmp->nullm_vfs); + TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp, + mnt_upper_link); + MNT_IUNLOCK(xmp->nullm_vfs); + } =20 vfs_mountedfrom(mp, target); =20 @@ -234,13 +252,15 @@ nullfs_unmount(mp, mntflags) */ mntdata =3D mp->mnt_data; ump =3D mntdata->nullm_vfs; - MNT_ILOCK(ump); - while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) !=3D 0) { - ump->mnt_kern_flag |=3D MNTK_VGONE_WAITER; - msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0); + if ((mntdata->nullm_flags & NULLM_CACHE) !=3D 0) { + MNT_ILOCK(ump); + while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) !=3D 0) { + ump->mnt_kern_flag |=3D MNTK_VGONE_WAITER; + msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0); + } + TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link); + MNT_IUNLOCK(ump); } - TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link); - MNT_IUNLOCK(ump); mp->mnt_data =3D NULL; free(mntdata, M_NULLFSMNT); return (0); diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index f530ed2..cc35d81 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -692,7 +692,22 @@ null_unlock(struct vop_unlock_args *ap) static int null_inactive(struct vop_inactive_args *ap __unused) { + struct vnode *vp; + struct mount *mp; + struct null_mount *xmp; =20 + vp =3D ap->a_vp; + mp =3D vp->v_mount; + xmp =3D MOUNTTONULLMOUNT(mp); + if ((xmp->nullm_flags & NULLM_CACHE) =3D=3D 0) { + /* + * If this is the last reference and caching of the + * nullfs vnodes is not enabled, then free up the + * vnode so as not to tie up the lower vnodes. + */ + vp->v_object =3D NULL; + vrecycle(vp); + } return (0); } =20 --VxJb6WgA6MoA+arP Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQ4weOAAoJEJDCuSvBvK1BMKgP/A69Gl8KwEddfL34TpG60/DT pIPM5HcVFKgSqula2WnVCvPX9RsrMyRsFhVCnvp8e7kjbMA2h4NBruWiY8kzveBg MS2RQ2p/EdBJhZtxF4KrrK0nVfw0LladvueeIS/5W2OfaGIhXcH7dhWkFNWD3UW5 06YkeR5UjQG7g4XaNExXCPxeleNzRV6f7Lk2LDjbbAaqH3L/PusEV3F5lxPHDgpd 8BC8+9XS5PS4Y323O04PSBU2mCDUwmaxFSsAoFXajYFwVqLZUECWMbRU7g+YeoVQ k9DQ6uBjW8dcA30H0dv1gkc6yfF9O6JClA0HeO/BJNllpDFweWuU1wo1+zauTFmF YvK0rRRpCEmssq1Eb7SrhNfBG8bwEVPQlcJhgbGIcD8FtAGPnF209hCgGX2CmKJM t7uoI8adZRVTlCASXUbK4XFQKEyPGn+iWJJSWexxDWwP3RbHFTQIk7mIIRXP40Nm U+QCRUdMmRvrE/9vFZuR7yLWC8WxZZgvvGvxsbzM0ZRkNB55POGOAdIa7OZOsVEP 0kPAyhwdYfyEgJAIqtHQzSLKgv2wBdm1P+2exu0OpVgZ+/trKJSV8Qm9aO/lVldo jVPs5kHfNwqGoHkiK/Bp2LRBHWBh+Rn9DRazMa8d46c2bfEyrftdfDAnGG6Wxz+w XTIZFHd10XHSwes5MOjK =zzfD -----END PGP SIGNATURE----- --VxJb6WgA6MoA+arP--