Date: Wed, 02 Sep 1998 23:54:30 +0000 From: Mike Smith <mike@smith.net.au> To: Terry Lambert <tlambert@primenet.com> Cc: hackers@FreeBSD.ORG Subject: Re: Reading/writing /usr/ports is VERY slow Message-ID: <199809022354.XAA00818@word.smith.net.au> In-Reply-To: Your message of "Thu, 03 Sep 1998 02:47:02 GMT." <199809030247.TAA08033@usr07.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> > Could you explain how the proposed change would do this? It's no worse > > than the current case if there is one cg with substantially fewer > > directories than any other. In fact, you could simplify the code > > somewhat just by lying about the number of directories in the cg > > referenced by the rotor. > > [ ... ] > > > > The original "free reserve" values were set to 10% for a reason; it > > > was a compromise between people who wanted to use every byte, and > > > the actual 15% value required for a "perfect hash". > > > > The change I proposed honours the reserve. Now I *know* you didn't > > look at it. 8) > > You aren't proposing to damage the free reserve, you are proposing to > damage the distribution of the block allocation hash. I'd say "damage" is a bit of an overstatement. If you have 16MB cg's on say a 1GB partition you have 64 of them. Split across these 64 groups you allocate your 5,000 directories, giving you about 80 directories per cg. Now, if we cluster the allocations so that you do 8 at a time, your worst case will be one cg has 8 more directories than the fewest. This is about like carefully picking 8 directories and then deleting them; it strikes me as so close to "noise" that it'd be irrelevant. > A disk becomes "fragmented" when you have hash collisions in the > block allocation hash. Calling the directory inode allocation a "hash" is a joke. It's a brute-force levelling function. It's not hashed either on creation (linear search) or on lookup (direct reference). > > > In effect, when the FS picks a block (or, more correctly, a cluster), > > > it is hashing the filespace onto the disk. > > > > Unfortunately, in the case of directories this results in them being > > scattered all over the disk. If you're creating a pile of them all at > > once in a hierarchy, the net result is very poor locality of reference. > > Yes. Agreed. But the locality is *intentionally* horizontal, by > design. That's clear from the code. The point here is that the intention may not necessarily have been quite so golden; there's certainly no indication that any attempt was made to optimise for any particular behaviour. The disabled rotor code in ffs_blkpref seems to indicate in fact that the rotating distribution (a linear hash of time) was abandoned in favour of the levelling function. All I'm suggesting is a tradeoff between the levelling function and the philosophy that's embodied in the cluster:block relationship and the policy that puts direct blocks in the same cg as the file's inode. > > Associating locality and time of creation is not a wonderful algorithm, > > I freely admit. However, I think that it has some merit over the > > current approach (forcibly minimise locality under some circumstances). > > I think changing the ports to populate directory hierarchies breadth-first > would result in better performance for this particular use. Almost certainly not; see the breadth/depth/width discussion. Moving from any directory to any other directory will almost guarantee a large disk traversal. The block/directory cache hit rate might be slightly better, but it's a finite resource. > The problem with preturbing the hash is that you will pessimize the > case where you are creating files in a single directory over time. The change I proposed does nothing of the sort; *all* it does is slightly weight the locality of reference of directory inodes (and thus associated direct blocks) towards the time of their creation. The pathalogical case for this optimisation is to create a small number of directories very rapidly, and then consume all their direct blocks. This would leave the cg in question *slightly* starved for blocks. You could then repeat the operation and cluster into the next cg. But this would *still* have the levelling effect, as when the cg is abandoned, the most-free candidate is chosen. You could think of it as simply multiplying the disk block size by the clustering factor, or shrinking the CG correspondingly. Given that our cg's are already much larger than they used to be, this is unlikely to be a problem. > In the general case, you tend to use all files in a given directory > at a time, regardless of their creation date. This is irrelevant; the existing optimisations still apply. > Maybe we should note the way that directories do block I/O is not > the same as the way files do block I/O. According to the comments, > this behaviour is an intentional "play it safe" move on the part of > the author. I don't see how this has any significance; all we're talking about is block allocation for the directory inode. > > If you have the stuff set up to test, I'd really be interested to see > > if the clustering I proposed demonstrated any effects. > > I don't have the equipment at this point to be able to do a reasonable > job of testing. Certainly nothing I'd hang the outcome of an "is this > a good idea?" question... 8-(. 8( None of the other guinea-pigs seem interested either. *sigh* -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809022354.XAA00818>