Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Oct 2003 17:20:50 -0800
From:      Ken Marx <kmarx@vicor.com>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        mckusick@beastie.mckusick.com
Subject:   Re: 4.8 ffs_dirpref problem
Message-ID:  <3FA06772.10409@vicor.com>
In-Reply-To: <200310290859.h9T8xWeF028514@gw.catspoiler.org>
References:  <200310290859.h9T8xWeF028514@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Don Lewis wrote:
> On 28 Oct, Ken Marx wrote:
> 
>>
>>Kirk McKusick wrote:
> 
> 
>>>I does look like the hash function is having some trouble.
>>>It has been completely revamped in 5.0, but is still using
>>>a "power-of-2" hashing scheme in 4.X. I highly recommend 
>>>trying a scheme with non-power-of-2 base. Perhaps something
>>>as simple as changing the hashing to use modulo rather than 
>>>logical & (e.g., in bufhash change from & bufhashmask to
>>>% bufhashmask).
>>>
>>>	Kirk McKusick
>>>
>>>
> 
> 
>>We have a sample 'fix' for the hashtable in vfs_bio.c
>>that uses all the blkno bits. It's in the diff link above.
>>Use as you see fit. However, it too doesn't really address
>>our symptoms significantly. Darn.
>>Bogging down to 1Mb/sec and > 90% system seen.
> 
> 
> A Fibonacci hash, like I implemented in the kern/kern_mtxpool.c 1.8,
> might be a good choice here, since it tends to distribute the keys
> fairly uniformly.  I think this is a secondary issue, though.
> 
> I think the real problem is the following code in ffs_dirpref():
> 
>         avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg;
>         avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg;
>         avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg;
> [snip]
>         maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg);
>         minifree = avgifree - fs->fs_ipg / 4;
>         if (minifree < 0)
>                 minifree = 0;
>         minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4;
>         if (minbfree < 0)
>                 minbfree = 0;
> [snip]
>         prefcg = ino_to_cg(fs, pip->i_number);
>         for (cg = prefcg; cg < fs->fs_ncg; cg++)
>                 if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
>                     fs->fs_cs(fs, cg).cs_nifree >= minifree &&
>                     fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
>                         if (fs->fs_contigdirs[cg] < maxcontigdirs)
>                                 return ((ino_t)(fs->fs_ipg * cg));
>                 }
>         for (cg = 0; cg < prefcg; cg++)
>                 if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
>                     fs->fs_cs(fs, cg).cs_nifree >= minifree &&
>                     fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
>                         if (fs->fs_contigdirs[cg] < maxcontigdirs)
>                                 return ((ino_t)(fs->fs_ipg * cg));
>                 }
> 
> If the file system is more than 75% full, minbfree will be zero, which
> will allow new directories to be created in cylinder groups that have no
> free blocks for either the directory itself, or for any files created in
> that directory.  If this happens, allocating the blocks for the
> directory and its files will require ffs_alloc() to do an expensive
> search across the cylinder groups for each block.  It looks to me like
> minbfree needs to equal, or at least a lot closer to avgbfree.
> 
> A similar situation exists with minifree.  Please note that the fallback
> algorithm uses the condition:
> 	fs->fs_cs(fs, cg).cs_nifree >= avgifree
> 
> 
> 

Interesting. We (Vicor) will defer to experts here, but are very willing to
test anything you come up with.

thanks,
k
-- 
Ken Marx, kmarx@vicor-nb.com
I insist that we do the right thing and be accountable for the realistic 
goals.
		- http://www.bigshed.com/cgi-bin/speak.cgi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FA06772.10409>