From owner-freebsd-fs@FreeBSD.ORG Wed Jul 13 08:37:09 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A36B106564A; Wed, 13 Jul 2011 08:37:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 2537A8FC14; Wed, 13 Jul 2011 08:37:08 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p6D8b3F2020185 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 13 Jul 2011 18:37:06 +1000 Date: Wed, 13 Jul 2011 18:37:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin In-Reply-To: <201107121536.58643.jhb@freebsd.org> Message-ID: <20110713174223.C932@besplex.bde.org> References: <10589627.445480.1310418556785.JavaMail.root@erie.cs.uoguelph.ca> <201107120744.26047.jhb@freebsd.org> <861uxvimab.fsf@gmail.com> <201107121536.58643.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Pan Tsu Subject: Re: ignore duplicates (Was: request for review of exports.5 update) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2011 08:37:09 -0000 On Tue, 12 Jul 2011, John Baldwin wrote: > On Tuesday, July 12, 2011 10:52:28 am Pan Tsu wrote: >> As for whether it matters to descend here is an example >> >> # disable caching metadata/data before test >> $ zfs set primarycache=none foo/usr/src >> $ zfs set secondarycache=none foo/usr/src >> >> $ time find /usr/src/sys ! -path '*.svn*' >/dev/null >> $ time find /usr/src/sys ! -path '*.svn*' -or -prune >/dev/null Not exactly what I'm looking for, but it seems that a script that adds exotic args to some utility is needed. (I have only 1 nontrivial vcs script, for un-applying and then re-applying applying local patches to cvs checkouts). >> On my 3yo box I don't even need ministat(1) to decide >> >> 26.78sr 0.21su 1.09ss 4% 1420k 45s+2194u 217pr+0pf+0w 28377+0io 28394+8935cs >> 3.68sr 0.07su 0.13ss 5% 1420k 46s+2260u 217pr+0pf+0w 3156+0io 3158+876cs This still has some problems: - still extremely slow. On my 6yo system running ~5.2-CURRENT, there are only 10760 files in /usr/src/sys on ffs (including CVS files and about 1000 local files, but no object files). These take 0.04 seconds to find, once cached. Breaking the cache to test the uncached case is too hard with ffs. On a FreeBSD cluster machine running ~9.0- CURRENT, exponential bloat (mainly almost quadrupling for .svn files) results in 48556 files in /usr/src/sys on ffs, but only 12138 files after removing all .svn files and 13698 files after removing all .svn files except the .svn directories. These take 0.90 seconds to find with a plain find(1); 1.59 seconds with the first of the above, and 0.34 seconds with the second of the above. Breaking the cache to test the uncached case is too hard with ffs. - the first version works, but the one with -prune finds .svn directories (1560 of them in FreeBSD-9-not-quite-current). > Ah, nice. This is a definite improvement. I've modified my script as such: Pruning apparently reduces the number of files stat'ed by almost a factor of almost 4, since svn almost quadruplicates the number of files. But why is "find -path" so much slower than plain find? > #!/bin/sh > # > # Grep inside a kernel directory skipping compile directories and revision > # control directories > > find `ls` '(' ! '(' -name compile -o -name .svn -o -name CVS ')' -o -prune ')' \ > ! -name '*cscope*' ! -type d -print0 | xargs -0 grep -H "$@" "find -name" is much faster than "find -path" on the FreeBSD cluster machine. It takes only 0.08 seconds, which is acceptably slower than the 0.04 seconds on my old machine (due to the nfs overhead and 30% more files). It has the same problem as "find -path" when pruning -- it doesn't remove the .svn directories. These can be removed with another "! -name" of course. The first version with -path should be the best one. find(1) should be smarter and not descend into directories that already match "! -path". On my old machine, "find ... ! -name CVS -o -prune" (to prune a couple of thousand CVS files but not the 910 CVS directories)) takes only 0.02 seconds. "find ... ! -path '*CVS*' -prune" also takes 0.02 seconds; "find ... ! -path '*CVS*' is what takes 0.04 seconds, and a plain find takes 0.03 seconds. In other words, -name is imperceptibly faster than -prune. So there seems to be another problem with -path in -current -- it is 0.35/0.08 times slower than -name. On second thoughts, this is probably just the nfs close-to-open-consistency pessimization, perhaps combined with nfs opening directories more than necessary. [l]stat(2)'s should be cached even in nfs, but every directory open requires RPCs. Bruce