Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 01 Sep 1998 23:12:49 +0000
From:      Mike Smith <mike@smith.net.au>
To:        Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc:        hackers@FreeBSD.ORG, cmascott@world.std.com, mckusick@mckusick.com
Subject:   Re: Reading/writing /usr/ports is VERY slow (fwd) 
Message-ID:  <199809012312.XAA01133@word.smith.net.au>
In-Reply-To: Your message of "Wed, 02 Sep 1998 05:04:59 %2B0200." <Pine.BSF.3.91.980902045649.10256A-100000@labinfo.iet.unipi.it> 

next in thread | previous in thread | raw e-mail | index | archive | help
(Kirk, please forgive me for copying you on this; if you have a moment 
 or three to comment on the issues being raised here, I think we'd all 
 benefit.)

> Well, some of you might have seen this, some not, but there was a kind
> of interesting thread on the newsgroup about /usr/ports access speed,
> and the author of the message below (who is Cc:) found out that
> creating a partition with only one cylinder group (instead of more
> using the same exact disk and partition size and location) speeds
> up disk access for /usr/ports by a factor of 3.

This would definitely be the case for such a relatively static layout,
at least for the traversal case (locality of reference). I think it
might be quite suboptimal in the case where the directories were likely
to shrink/grow on a regular basis, simply because it would encourage 
fragmentation.

> I am not sure where this happens, but perhaps an easy fix could be to
> stay in the same CG if its  (CG) occupation is low enough ?

The problem actually seems to stem from the code balancing directory 
allocations "too well".  See below.

> Before you mention softupdates... the author also experimented with
> "noatime" and softupdates and the results were that using 1 CG
> is much faster.

The issue here is more the layout of actual data blocks, and the 
author's principal complaint is with reading, not writing, so lazy 
metadata isn't going to change all that much.

> ---------- Forwarded message ----------
> Date: Tue, 1 SEP 1998 01:20:50 GMT 
> From: Carl Mascott <cmascott@world.std.com>
> Newgroups: comp.unix.bsd.freebsd.misc
> Subject: Re: Reading/writing /usr/ports is VERY slow 
...
> Take a look at my numbers again.  The benefits of confining a
> directory tree to one cylinder group are about twice as great as the
> benefits of eliminating (not just reducing) metadata updates for the
> read case.
> 
> As I mentioned, the directory placement policy makes "find" slow as
> well.  That's more significant than performance in /usr/ports.

The directory placement policy selects the cg with the fewest 
directories, and adds the directory there.  If you're in the process of 
creating the ~5,000 directories in the ports collection, this is going 
to balance out the allocation of directories across your cg's and put 
you into a round-robin situation quite quickly.

The code in question is pretty succinct, from ufs/ffs/ffs_alloc.c:

/*
 * Find a cylinder to place a directory.
 *
 * The policy implemented by this algorithm is to select from
 * among those cylinder groups with above the average number of
 * free inodes, the one with the smallest number of directories.
 */
static ino_t
ffs_dirpref(fs)
        register struct fs *fs;
{
        int cg, minndir, mincg, avgifree;

        avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg;
        minndir = fs->fs_ipg;
        mincg = 0;
        for (cg = 0; cg < fs->fs_ncg; cg++)
                if (fs->fs_cs(fs, cg).cs_ndir < minndir &&
                    fs->fs_cs(fs, cg).cs_nifree >= avgifree) {
                        mincg = cg;
                        minndir = fs->fs_cs(fs, cg).cs_ndir;
                }
        return ((ino_t)(fs->fs_ipg * mincg));
}

You might improve this by stealing the fs_cgrotor field (which is set
but never used by fs_blkpref) to apply a bias to the above selection; if
the cg referenced by the rotor is within some threshold above the "best"
cg, use the one referred to by the rotor instead.  This threshold could 
probably be fairly small (eg. 8 or 16) with good results in this case; 
it would tend to cluster the allocation of directory inodes and thus 
their directories.

In fact, it might make sense to recycle the rotor in a more general 
fashion; use it as a hint as to where allocation was performed last in 
order to cluster your allocation somewhat.

static ino_t
ffs_dirpref(fs)
        register struct fs *fs;
{
        int cg, minndir, mincg, avgifree;

        avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg;
        minndir = fs->fs_ipg;
        mincg = 0;
        for (cg = 0; cg < fs->fs_ncg; cg++)
                if (fs->fs_cs(fs, cg).cs_ndir < minndir &&
                    fs->fs_cs(fs, cg).cs_nifree >= avgifree) {
                        mincg = cg;
                        minndir = fs->fs_cs(fs, cg).cs_ndir;
                }
	/* Attempt to cluster allocations within reasonable thresholds */
	if ((fs->fs_cs(fs, fs->fs_cgrotor).cs_nidir <
                        (minndir + DIRDCLUSTER)) &&
            (fs->fs_cs(fs, fs->fs_cgrotor).cs_nifree >
                        (fs->fs_cs(fs, mincg).cs_nifree - DIRICLUSTER)) &&
            (fs->fs_cs(fs, fs->fs_cgrotor).cs_nifree > 0)) {
            mincg = fs->fs_cgrotor;   /* use previous allocation instead */
        } else {
            fs->fs_cgrotor = mincg;   /* remember new allocation area */
        }
        return ((ino_t)(fs->fs_ipg * mincg));
}

If someone were to take this and run with it, the results might be 
interesting.  It would be worthwhile using fs_cgrotor as a hint in 
ffs_blkpref() as well (which currently sets it but never reads it).

I don't have the time to experiment with this, but I'd certainly like 
to know where I'm wrong.

> In short, I think that there are more long inter-group seeks on FFS
> than there need be, due to the directory placement policy.  I consider
> it  undesirable to spread directories out as much as possible all over
> the filesystem.

One of the substantial contributors to this problem has been the 
FreeBSD policy of limiting the number of cg's by faking the disk 
layout.  The intention here was to reduce the amount of space wasted in 
administrative overhead, as well as increase the likelihood of 
contiguous allocation.  Unfortunately one of the less wonderful 
consequences has been the substantial increase in inter-cg distances.

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809012312.XAA01133>