Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Oct 2003 00:59:32 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        kmarx@vicor.com
Cc:        mckusick@beastie.mckusick.com
Subject:   Re: 4.8 ffs_dirpref problem
Message-ID:  <200310290859.h9T8xWeF028514@gw.catspoiler.org>
In-Reply-To: <3F9F26DB.6050207@vicor.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28 Oct, Ken Marx wrote:
> 
> 
> Kirk McKusick wrote:

>> I does look like the hash function is having some trouble.
>> It has been completely revamped in 5.0, but is still using
>> a "power-of-2" hashing scheme in 4.X. I highly recommend 
>> trying a scheme with non-power-of-2 base. Perhaps something
>> as simple as changing the hashing to use modulo rather than 
>> logical & (e.g., in bufhash change from & bufhashmask to
>> % bufhashmask).
>> 
>> 	Kirk McKusick
>> 
>> 

> We have a sample 'fix' for the hashtable in vfs_bio.c
> that uses all the blkno bits. It's in the diff link above.
> Use as you see fit. However, it too doesn't really address
> our symptoms significantly. Darn.
> Bogging down to 1Mb/sec and > 90% system seen.

A Fibonacci hash, like I implemented in the kern/kern_mtxpool.c 1.8,
might be a good choice here, since it tends to distribute the keys
fairly uniformly.  I think this is a secondary issue, though.

I think the real problem is the following code in ffs_dirpref():

        avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg;
        avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg;
        avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg;
[snip]
        maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg);
        minifree = avgifree - fs->fs_ipg / 4;
        if (minifree < 0)
                minifree = 0;
        minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4;
        if (minbfree < 0)
                minbfree = 0;
[snip]
        prefcg = ino_to_cg(fs, pip->i_number);
        for (cg = prefcg; cg < fs->fs_ncg; cg++)
                if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
                    fs->fs_cs(fs, cg).cs_nifree >= minifree &&
                    fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
                        if (fs->fs_contigdirs[cg] < maxcontigdirs)
                                return ((ino_t)(fs->fs_ipg * cg));
                }
        for (cg = 0; cg < prefcg; cg++)
                if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
                    fs->fs_cs(fs, cg).cs_nifree >= minifree &&
                    fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
                        if (fs->fs_contigdirs[cg] < maxcontigdirs)
                                return ((ino_t)(fs->fs_ipg * cg));
                }

If the file system is more than 75% full, minbfree will be zero, which
will allow new directories to be created in cylinder groups that have no
free blocks for either the directory itself, or for any files created in
that directory.  If this happens, allocating the blocks for the
directory and its files will require ffs_alloc() to do an expensive
search across the cylinder groups for each block.  It looks to me like
minbfree needs to equal, or at least a lot closer to avgbfree.

A similar situation exists with minifree.  Please note that the fallback
algorithm uses the condition:
	fs->fs_cs(fs, cg).cs_nifree >= avgifree




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200310290859.h9T8xWeF028514>