Date: Sat, 18 Nov 2006 21:04:24 +0300 From: Yar Tikhiy <yar@comp.chem.msu.su> To: Kostik Belousov <kostikbel@gmail.com> Cc: David Malone <dwmalone@maths.tcd.ie>, hackers@freebsd.org Subject: Re: File trees: the deeper, the weirder Message-ID: <20061118180424.GC80527@comp.chem.msu.su> In-Reply-To: <20061118110544.GN1841@deviant.kiev.zoral.com.ua> References: <20061029140716.GA12058@comp.chem.msu.su> <20061029152227.GA11826@walton.maths.tcd.ie> <006801c6fb77$e4e30100$1200a8c0@gsicomp.on.ca> <20061030130519.GE27062@comp.chem.msu.su> <20061030134737.GF1627@deviant.kiev.zoral.com.ua> <20061118095400.GE68439@comp.chem.msu.su> <20061118110544.GN1841@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 18, 2006 at 01:05:44PM +0200, Kostik Belousov wrote: > On Sat, Nov 18, 2006 at 12:54:00PM +0300, Yar Tikhiy wrote: > > On Mon, Oct 30, 2006 at 03:47:37PM +0200, Kostik Belousov wrote: > > > On Mon, Oct 30, 2006 at 04:05:19PM +0300, Yar Tikhiy wrote: > > > > On Sun, Oct 29, 2006 at 11:32:58AM -0500, Matt Emmerton wrote: > > > > > [ Restoring some OP context.] > > > > > > > > > > > On Sun, Oct 29, 2006 at 05:07:16PM +0300, Yar Tikhiy wrote: > > > > > > > > > > > > > As for the said program, it keeps its 1 Hz pace, mostly waiting on > > > > > > > "vlruwk". It's killable, after a delay. The system doesn't show ... > > > > > > > > > > > > > > Weird, eh? Any ideas what's going on? > > > > > > > > > > > > I would guess that you need a new vnode to create the new file, but no > > > > > > vnodes are obvious candidates for freeing because they all have a child > > > > > > directory in use. Is there some sort of vnode clearing that goes on every > > > > > > second if we are short of vnodes? > > > > > > > > > > See sys/vfs_subr.c, subroutine getnewvnode(). We call msleep() if we're > > > > > waiting on vnodes to be created (or recycled). And just look at the 'hz' > > > > > parameter passed to msleep()! > > > > > > > > > > The calling process's mkdir() will end up waiting in getnewvnode() (in > > > > > "vlruwk" state) while the vnlru kernel thread does it's thing (which is to > > > > > recycle vnodes.) > > > > > > > > > > Either the vnlru kernel thread has to work faster, or the caller has to > > > > > sleep less, in order to avoid this lock-step behaviour. > > > > > > > > I'm afraid that, though your analysis is right, you arrive at wrong > > > > conclusions. The process waits for the whole second in getnewvnode() > > > > because the vnlru thread cannot free as much vnodes as it wants to. > > > > vnlru_proc() will wake up sleepers on vnlruproc_sig (i.e., > > > > getnewvnode()) only if (numvnodes <= desiredvnodes * 9 / 10). > > > > Whether this condition is attainable depends on vlrureclaim() (called > > > > from the vnlru thread) freeing vnodes at a sufficient rate. Perhaps > > > > vlrureclaim() just can't keep the pace at this conditions. > > > > debug.vnlru_nowhere increasing is an indication of that. Consequently, > > > > each getnewvnode() call sleeps 1 second, then grabs a vnode beyond > > > > desiredvnodes. It's no surprise that the 1 second delays start to > > > > appear after approx. kern.maxvnodes directories were created. > > > > > > I think that David is right. The references _from_ the directory make it immune > > > to vnode reclamation. Try this patch. It is very unfair for lsof. > > > > > > Index: sys/kern/vfs_subr.c > > > =================================================================== > > > RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_subr.c,v > > > retrieving revision 1.685 > > > diff -u -r1.685 vfs_subr.c > > > --- sys/kern/vfs_subr.c 2 Oct 2006 07:25:58 -0000 1.685 > > > +++ sys/kern/vfs_subr.c 30 Oct 2006 13:44:59 -0000 > > > @@ -582,7 +582,7 @@ > > > * If it's been deconstructed already, it's still > > > * referenced, or it exceeds the trigger, skip it. > > > */ > > > - if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) || > > > + if (vp->v_usecount || /* !LIST_EMPTY(&(vp)->v_cache_src) || */ > > > (vp->v_iflag & VI_DOOMED) != 0 || (vp->v_object != NULL && > > > vp->v_object->resident_page_count > trigger)) { > > > VI_UNLOCK(vp); > > > @@ -607,7 +607,7 @@ > > > * interlock, the other thread will be unable to drop the > > > * vnode lock before our VOP_LOCK() call fails. > > > */ > > > - if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) || > > > + if (vp->v_usecount || /* !LIST_EMPTY(&(vp)->v_cache_src) || */ > > > (vp->v_object != NULL && > > > vp->v_object->resident_page_count > trigger)) { > > > VOP_UNLOCK(vp, LK_INTERLOCK, td); > > > > By the way, what do you think v_cache_src is for? The only two > > places it is used in the kernel are in the unused function > > cache_leaf_test() and this one, in vlrureclaim(). Is its main > > purpose just to keep directory vnodes that are referenced by nc_dvp > > in some namecache entries? > > I think that yes. Now, it mostly gives immunity for the vnodes that > could be used for getcwd()/lsof path lookups through namecache. Another purpose of v_cache_src that I missed is to allow for removing all namecache entries with nc_dvp pointing to a particular vnode when the vnode is recycled so that we don't end up with stale nc_dvp's in the namecache. Perhaps this is the main role v_cache_src plays. > Does my change helped on you load ? Your hack works, thanks! Your analysis of the problem proves correct. And I'm gaining some understanding of it, too :-) > cache_leaf_test() seems to be way to go. By partitioning vlru reclaim into > two stages - first, which reclaim leaf vnodes (that it, vnodes that do > not contain child dirs in namecache), and second, which will be fired only > if first stage failed to free something and simply ignores v_cache_src, as > in my change. See comment for rev. 1.56 of vfs_cache.c. Excuse me, but why "vnodes that do not contain child dirs in the namecache"? Perhaps they should be vnodes that do not contain _any_ children in the namecache? That would be better suited for trying to preserve information for vn_fullpath(). However, I must admit that I don't know how lsof works because I've never used it. -- Yar
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061118180424.GC80527>