From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 18 11:06:11 2006 Return-Path: X-Original-To: hackers@freebsd.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 23C5616A49E for ; Sat, 18 Nov 2006 11:06:10 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (fw.zoral.com.ua [213.186.206.134]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9F89F43D6A for ; Sat, 18 Nov 2006 11:06:04 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id kAIB5kEa025347 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 18 Nov 2006 13:05:46 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8) with ESMTP id kAIB5kjU047776; Sat, 18 Nov 2006 13:05:46 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8/Submit) id kAIB5iSN047775; Sat, 18 Nov 2006 13:05:44 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 18 Nov 2006 13:05:44 +0200 From: Kostik Belousov To: Yar Tikhiy Message-ID: <20061118110544.GN1841@deviant.kiev.zoral.com.ua> References: <20061029140716.GA12058@comp.chem.msu.su> <20061029152227.GA11826@walton.maths.tcd.ie> <006801c6fb77$e4e30100$1200a8c0@gsicomp.on.ca> <20061030130519.GE27062@comp.chem.msu.su> <20061030134737.GF1627@deviant.kiev.zoral.com.ua> <20061118095400.GE68439@comp.chem.msu.su> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="e8/wErwm0bqugfcz" Content-Disposition: inline In-Reply-To: <20061118095400.GE68439@comp.chem.msu.su> User-Agent: Mutt/1.4.2.2i X-Virus-Scanned: ClamAV version 0.88.4, clamav-milter version 0.88.4 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL, UNPARSEABLE_RELAY autolearn=no version=3.1.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on fw.zoral.com.ua Cc: David Malone , hackers@freebsd.org Subject: Re: File trees: the deeper, the weirder X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Nov 2006 11:06:11 -0000 --e8/wErwm0bqugfcz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Nov 18, 2006 at 12:54:00PM +0300, Yar Tikhiy wrote: > On Mon, Oct 30, 2006 at 03:47:37PM +0200, Kostik Belousov wrote: > > On Mon, Oct 30, 2006 at 04:05:19PM +0300, Yar Tikhiy wrote: > > > On Sun, Oct 29, 2006 at 11:32:58AM -0500, Matt Emmerton wrote: > > > > [ Restoring some OP context.] > > > >=20 > > > > > On Sun, Oct 29, 2006 at 05:07:16PM +0300, Yar Tikhiy wrote: > > > > > > > > > > > As for the said program, it keeps its 1 Hz pace, mostly waiting= on > > > > > > "vlruwk". It's killable, after a delay. The system doesn't sh= ow ... > > > > > > > > > > > > Weird, eh? Any ideas what's going on? > > > > > > > > > > I would guess that you need a new vnode to create the new file, b= ut no > > > > > vnodes are obvious candidates for freeing because they all have a= child > > > > > directory in use. Is there some sort of vnode clearing that goes = on every > > > > > second if we are short of vnodes? > > > >=20 > > > > See sys/vfs_subr.c, subroutine getnewvnode(). We call msleep() if = we're > > > > waiting on vnodes to be created (or recycled). And just look at th= e 'hz' > > > > parameter passed to msleep()! > > > >=20 > > > > The calling process's mkdir() will end up waiting in getnewvnode() = (in > > > > "vlruwk" state) while the vnlru kernel thread does it's thing (whic= h is to > > > > recycle vnodes.) > > > >=20 > > > > Either the vnlru kernel thread has to work faster, or the caller ha= s to > > > > sleep less, in order to avoid this lock-step behaviour. > > >=20 > > > I'm afraid that, though your analysis is right, you arrive at wrong > > > conclusions. The process waits for the whole second in getnewvnode() > > > because the vnlru thread cannot free as much vnodes as it wants to. > > > vnlru_proc() will wake up sleepers on vnlruproc_sig (i.e., > > > getnewvnode()) only if (numvnodes <=3D desiredvnodes * 9 / 10). > > > Whether this condition is attainable depends on vlrureclaim() (called > > > from the vnlru thread) freeing vnodes at a sufficient rate. Perhaps > > > vlrureclaim() just can't keep the pace at this conditions. > > > debug.vnlru_nowhere increasing is an indication of that. Consequentl= y, > > > each getnewvnode() call sleeps 1 second, then grabs a vnode beyond > > > desiredvnodes. It's no surprise that the 1 second delays start to > > > appear after approx. kern.maxvnodes directories were created. > >=20 > > I think that David is right. The references _from_ the directory make i= t immune > > to vnode reclamation. Try this patch. It is very unfair for lsof. > >=20 > > Index: sys/kern/vfs_subr.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_subr.c,v > > retrieving revision 1.685 > > diff -u -r1.685 vfs_subr.c > > --- sys/kern/vfs_subr.c 2 Oct 2006 07:25:58 -0000 1.685 > > +++ sys/kern/vfs_subr.c 30 Oct 2006 13:44:59 -0000 > > @@ -582,7 +582,7 @@ > > * If it's been deconstructed already, it's still > > * referenced, or it exceeds the trigger, skip it. > > */ > > - if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) || > > + if (vp->v_usecount || /* !LIST_EMPTY(&(vp)->v_cache_src) || */ > > (vp->v_iflag & VI_DOOMED) !=3D 0 || (vp->v_object !=3D NULL && > > vp->v_object->resident_page_count > trigger)) { > > VI_UNLOCK(vp); > > @@ -607,7 +607,7 @@ > > * interlock, the other thread will be unable to drop the > > * vnode lock before our VOP_LOCK() call fails. > > */ > > - if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) || > > + if (vp->v_usecount || /* !LIST_EMPTY(&(vp)->v_cache_src) || */ > > (vp->v_object !=3D NULL &&=20 > > vp->v_object->resident_page_count > trigger)) { > > VOP_UNLOCK(vp, LK_INTERLOCK, td); >=20 > By the way, what do you think v_cache_src is for? The only two > places it is used in the kernel are in the unused function > cache_leaf_test() and this one, in vlrureclaim(). Is its main > purpose just to keep directory vnodes that are referenced by nc_dvp > in some namecache entries? I think that yes. Now, it mostly gives immunity for the vnodes that could be used for getcwd()/lsof path lookups through namecache. Does my change helped on you load ? cache_leaf_test() seems to be way to go. By partitioning vlru reclaim into two stages - first, which reclaim leaf vnodes (that it, vnodes that do not contain child dirs in namecache), and second, which will be fired only if first stage failed to free something and simply ignores v_cache_src, as in my change. See comment for rev. 1.56 of vfs_cache.c. --e8/wErwm0bqugfcz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFXukIC3+MBN1Mb4gRAiGjAJ0f/cjRs7VF0XTrLJzC97wQijPbMgCgw81K Ru8QuyN55t6xNp4UNYUuySY= =r88t -----END PGP SIGNATURE----- --e8/wErwm0bqugfcz--