Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Sep 1998 23:54:30 +0000
From:      Mike Smith <mike@smith.net.au>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: Reading/writing /usr/ports is VERY slow 
Message-ID:  <199809022354.XAA00818@word.smith.net.au>
In-Reply-To: Your message of "Thu, 03 Sep 1998 02:47:02 GMT." <199809030247.TAA08033@usr07.primenet.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> > Could you explain how the proposed change would do this?  It's no worse
> > than the current case if there is one cg with substantially fewer
> > directories than any other.  In fact, you could simplify the code
> > somewhat just by lying about the number of directories in the cg
> > referenced by the rotor.
> 
> [ ... ]
> 
> > > The original "free reserve" values were set to 10% for a reason; it
> > > was a compromise between people who wanted to use every byte, and
> > > the actual 15% value required for a "perfect hash".
> > 
> > The change I proposed honours the reserve.  Now I *know* you didn't 
> > look at it. 8)
> 
> You aren't proposing to damage the free reserve, you are proposing to
> damage the distribution of the block allocation hash.

I'd say "damage" is a bit of an overstatement.  If you have 16MB cg's on
say a 1GB partition you have 64 of them.  Split across these 64 groups
you allocate your 5,000 directories, giving you about 80 directories 
per cg.  Now, if we cluster the allocations so that you do 8 at a time, 
your worst case will be one cg has 8 more directories than the fewest.

This is about like carefully picking 8 directories and then deleting 
them; it strikes me as so close to "noise" that it'd be irrelevant.

> A disk becomes "fragmented" when you have hash collisions in the
> block allocation hash.

Calling the directory inode allocation a "hash" is a joke.  It's a 
brute-force levelling function.  It's not hashed either on creation 
(linear search) or on lookup (direct reference).

> > > In effect, when the FS picks a block (or, more correctly, a cluster),
> > > it is hashing the filespace onto the disk.
> > 
> > Unfortunately, in the case of directories this results in them being 
> > scattered all over the disk.  If you're creating a pile of them all at 
> > once in a hierarchy, the net result is very poor locality of reference.
> 
> Yes.  Agreed.  But the locality is *intentionally* horizontal, by
> design.

That's clear from the code.  The point here is that the intention may 
not necessarily have been quite so golden; there's certainly no 
indication that any attempt was made to optimise for any particular 
behaviour.

The disabled rotor code in ffs_blkpref seems to indicate in fact that
the rotating distribution (a linear hash of time) was abandoned in
favour of the levelling function.  All I'm suggesting is a tradeoff 
between the levelling function and the philosophy that's embodied in 
the cluster:block relationship and the policy that puts direct blocks 
in the same cg as the file's inode.

> > Associating locality and time of creation is not a wonderful algorithm, 
> > I freely admit.  However, I think that it has some merit over the 
> > current approach (forcibly minimise locality under some circumstances).
> 
> I think changing the ports to populate directory hierarchies breadth-first
> would result in better performance for this particular use.

Almost certainly not; see the breadth/depth/width discussion.  Moving 
from any directory to any other directory will almost guarantee a large 
disk traversal.  The block/directory cache hit rate might be slightly 
better, but it's a finite resource.

> The problem with preturbing the hash is that you will pessimize the
> case where you are creating files in a single directory over time.

The change I proposed does nothing of the sort; *all* it does is
slightly weight the locality of reference of directory inodes (and thus
associated direct blocks) towards the time of their creation.

The pathalogical case for this optimisation is to create a small number 
of directories very rapidly, and then consume all their direct blocks.  
This would leave the cg in question *slightly* starved for blocks.  You 
could then repeat the operation and cluster into the next cg.  But this 
would *still* have the levelling effect, as when the cg is abandoned, 
the most-free candidate is chosen.

You could think of it as simply multiplying the disk block size by the
clustering factor, or shrinking the CG correspondingly.  Given that our 
cg's are already much larger than they used to be, this is unlikely to 
be a problem.

> In the general case, you tend to use all files in a given directory
> at a time, regardless of their creation date.

This is irrelevant; the existing optimisations still apply.

> Maybe we should note the way that directories do block I/O is not
> the same as the way files do block I/O.  According to the comments,
> this behaviour is an intentional "play it safe" move on the part of
> the author.

I don't see how this has any significance; all we're talking about is 
block allocation for the directory inode.

> > If you have the stuff set up to test, I'd really be interested to see 
> > if the clustering I proposed demonstrated any effects.
> 
> I don't have the equipment at this point to be able to do a reasonable
> job of testing.  Certainly nothing I'd hang the outcome of an "is this
> a good idea?" question... 8-(.

8(  None of the other guinea-pigs seem interested either.  *sigh*

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809022354.XAA00818>