Date: Tue, 01 Sep 1998 23:12:49 +0000 From: Mike Smith <mike@smith.net.au> To: Luigi Rizzo <luigi@labinfo.iet.unipi.it> Cc: hackers@FreeBSD.ORG, cmascott@world.std.com, mckusick@mckusick.com Subject: Re: Reading/writing /usr/ports is VERY slow (fwd) Message-ID: <199809012312.XAA01133@word.smith.net.au> In-Reply-To: Your message of "Wed, 02 Sep 1998 05:04:59 %2B0200." <Pine.BSF.3.91.980902045649.10256A-100000@labinfo.iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
(Kirk, please forgive me for copying you on this; if you have a moment or three to comment on the issues being raised here, I think we'd all benefit.) > Well, some of you might have seen this, some not, but there was a kind > of interesting thread on the newsgroup about /usr/ports access speed, > and the author of the message below (who is Cc:) found out that > creating a partition with only one cylinder group (instead of more > using the same exact disk and partition size and location) speeds > up disk access for /usr/ports by a factor of 3. This would definitely be the case for such a relatively static layout, at least for the traversal case (locality of reference). I think it might be quite suboptimal in the case where the directories were likely to shrink/grow on a regular basis, simply because it would encourage fragmentation. > I am not sure where this happens, but perhaps an easy fix could be to > stay in the same CG if its (CG) occupation is low enough ? The problem actually seems to stem from the code balancing directory allocations "too well". See below. > Before you mention softupdates... the author also experimented with > "noatime" and softupdates and the results were that using 1 CG > is much faster. The issue here is more the layout of actual data blocks, and the author's principal complaint is with reading, not writing, so lazy metadata isn't going to change all that much. > ---------- Forwarded message ---------- > Date: Tue, 1 SEP 1998 01:20:50 GMT > From: Carl Mascott <cmascott@world.std.com> > Newgroups: comp.unix.bsd.freebsd.misc > Subject: Re: Reading/writing /usr/ports is VERY slow ... > Take a look at my numbers again. The benefits of confining a > directory tree to one cylinder group are about twice as great as the > benefits of eliminating (not just reducing) metadata updates for the > read case. > > As I mentioned, the directory placement policy makes "find" slow as > well. That's more significant than performance in /usr/ports. The directory placement policy selects the cg with the fewest directories, and adds the directory there. If you're in the process of creating the ~5,000 directories in the ports collection, this is going to balance out the allocation of directories across your cg's and put you into a round-robin situation quite quickly. The code in question is pretty succinct, from ufs/ffs/ffs_alloc.c: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. */ static ino_t ffs_dirpref(fs) register struct fs *fs; { int cg, minndir, mincg, avgifree; avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; minndir = fs->fs_ipg; mincg = 0; for (cg = 0; cg < fs->fs_ncg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < minndir && fs->fs_cs(fs, cg).cs_nifree >= avgifree) { mincg = cg; minndir = fs->fs_cs(fs, cg).cs_ndir; } return ((ino_t)(fs->fs_ipg * mincg)); } You might improve this by stealing the fs_cgrotor field (which is set but never used by fs_blkpref) to apply a bias to the above selection; if the cg referenced by the rotor is within some threshold above the "best" cg, use the one referred to by the rotor instead. This threshold could probably be fairly small (eg. 8 or 16) with good results in this case; it would tend to cluster the allocation of directory inodes and thus their directories. In fact, it might make sense to recycle the rotor in a more general fashion; use it as a hint as to where allocation was performed last in order to cluster your allocation somewhat. static ino_t ffs_dirpref(fs) register struct fs *fs; { int cg, minndir, mincg, avgifree; avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; minndir = fs->fs_ipg; mincg = 0; for (cg = 0; cg < fs->fs_ncg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < minndir && fs->fs_cs(fs, cg).cs_nifree >= avgifree) { mincg = cg; minndir = fs->fs_cs(fs, cg).cs_ndir; } /* Attempt to cluster allocations within reasonable thresholds */ if ((fs->fs_cs(fs, fs->fs_cgrotor).cs_nidir < (minndir + DIRDCLUSTER)) && (fs->fs_cs(fs, fs->fs_cgrotor).cs_nifree > (fs->fs_cs(fs, mincg).cs_nifree - DIRICLUSTER)) && (fs->fs_cs(fs, fs->fs_cgrotor).cs_nifree > 0)) { mincg = fs->fs_cgrotor; /* use previous allocation instead */ } else { fs->fs_cgrotor = mincg; /* remember new allocation area */ } return ((ino_t)(fs->fs_ipg * mincg)); } If someone were to take this and run with it, the results might be interesting. It would be worthwhile using fs_cgrotor as a hint in ffs_blkpref() as well (which currently sets it but never reads it). I don't have the time to experiment with this, but I'd certainly like to know where I'm wrong. > In short, I think that there are more long inter-group seeks on FFS > than there need be, due to the directory placement policy. I consider > it undesirable to spread directories out as much as possible all over > the filesystem. One of the substantial contributors to this problem has been the FreeBSD policy of limiting the number of cg's by faking the disk layout. The intention here was to reduce the amount of space wasted in administrative overhead, as well as increase the likelihood of contiguous allocation. Unfortunately one of the less wonderful consequences has been the substantial increase in inter-cg distances. -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809012312.XAA01133>