From owner-freebsd-hackers Wed Sep 2 23:39:06 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id XAA29029 for freebsd-hackers-outgoing; Wed, 2 Sep 1998 23:39:06 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from word.smith.net.au (castles138.castles.com [208.214.165.138]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA29022 for ; Wed, 2 Sep 1998 23:39:01 -0700 (PDT) (envelope-from mike@word.smith.net.au) Received: from word.smith.net.au (localhost [127.0.0.1]) by word.smith.net.au (8.9.1/8.8.8) with ESMTP id XAA00706; Wed, 2 Sep 1998 23:35:08 GMT (envelope-from mike@word.smith.net.au) Message-Id: <199809022335.XAA00706@word.smith.net.au> X-Mailer: exmh version 2.0.2 2/24/98 To: Terry Lambert cc: mike@smith.net.au (Mike Smith), cmascott@world.std.com, hackers@FreeBSD.ORG Subject: Re: Reading/writing /usr/ports is VERY slow In-reply-to: Your message of "Thu, 03 Sep 1998 02:35:32 GMT." <199809030235.TAA07404@usr07.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 02 Sep 1998 23:35:07 +0000 From: Mike Smith Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > This is only useful if the layout has clustered related directories on > > the disk. The current code does a good job of keeping them a long way > > apart. > > > > > In fact, directory locality is loosely assureed. > > > > Almost the opposite is true, especially if you don't make a common > > practice of deleting lots of directories. > > I think we are miscommunicating here. > > Directory *contents* are loosely assured locality. > > Directory *heirarchies* are loosely assured near-perfect random > non-locality. I seem to recall that's what I said last time. > Per the "find" command argument, this issue is "breadth-first" vs. > "depth-first". You need three dimensions; breadth is distance across siblings, depth is distance from parent through children, and width is individual directory size. Directories are associated width-first, not breadth-first or depth-first. Both breadth and depth are pessimised. > Either reimplement the directory code as a btree, or, better, "avoid > writing code that results in diretory recursions". The first has the effect of clustering directory data. Funnily enough, that's what I was attempting to achieve. > I think the ports problem in this respect is that the directory > entry cache is too small for the traversal being done. No, the ports problem is that all directories in the hierarchy will be visited, and you have to seek a long way from any one entry to almost any other entry. > Basically, "some idiot is using the FS directory hierarchy as if it > were a database index". 8-). Feel free to suggest an alternative. Code complete, of course. > The correct thing to do here would probably be to create a "fast find" > index, and have find use this (SunOS 4.1 has this; you create the > index at 3 am from cron -- 3am because that's when you break to go > to Naugles for egg-and-bean burritos 8-)). > > Alternately, a "portfinder" port would help... You're still only reading what you want to see. This sort of search is already trivialised; we keep an index (ports/INDEX), and have both a commandline ('make search key=') and graphical (sysutils/pib) search tool. With the ports collection, the issue is creation and backups. The same problem hits other applications though (anything involving the CVS repository, for example). -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message