Date: Wed, 29 Oct 2003 17:20:50 -0800 From: Ken Marx <kmarx@vicor.com> To: Don Lewis <truckman@FreeBSD.org> Cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem Message-ID: <3FA06772.10409@vicor.com> In-Reply-To: <200310290859.h9T8xWeF028514@gw.catspoiler.org> References: <200310290859.h9T8xWeF028514@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Don Lewis wrote: > On 28 Oct, Ken Marx wrote: > >> >>Kirk McKusick wrote: > > >>>I does look like the hash function is having some trouble. >>>It has been completely revamped in 5.0, but is still using >>>a "power-of-2" hashing scheme in 4.X. I highly recommend >>>trying a scheme with non-power-of-2 base. Perhaps something >>>as simple as changing the hashing to use modulo rather than >>>logical & (e.g., in bufhash change from & bufhashmask to >>>% bufhashmask). >>> >>> Kirk McKusick >>> >>> > > >>We have a sample 'fix' for the hashtable in vfs_bio.c >>that uses all the blkno bits. It's in the diff link above. >>Use as you see fit. However, it too doesn't really address >>our symptoms significantly. Darn. >>Bogging down to 1Mb/sec and > 90% system seen. > > > A Fibonacci hash, like I implemented in the kern/kern_mtxpool.c 1.8, > might be a good choice here, since it tends to distribute the keys > fairly uniformly. I think this is a secondary issue, though. > > I think the real problem is the following code in ffs_dirpref(): > > avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; > avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; > avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg; > [snip] > maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg); > minifree = avgifree - fs->fs_ipg / 4; > if (minifree < 0) > minifree = 0; > minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4; > if (minbfree < 0) > minbfree = 0; > [snip] > prefcg = ino_to_cg(fs, pip->i_number); > for (cg = prefcg; cg < fs->fs_ncg; cg++) > if (fs->fs_cs(fs, cg).cs_ndir < maxndir && > fs->fs_cs(fs, cg).cs_nifree >= minifree && > fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { > if (fs->fs_contigdirs[cg] < maxcontigdirs) > return ((ino_t)(fs->fs_ipg * cg)); > } > for (cg = 0; cg < prefcg; cg++) > if (fs->fs_cs(fs, cg).cs_ndir < maxndir && > fs->fs_cs(fs, cg).cs_nifree >= minifree && > fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { > if (fs->fs_contigdirs[cg] < maxcontigdirs) > return ((ino_t)(fs->fs_ipg * cg)); > } > > If the file system is more than 75% full, minbfree will be zero, which > will allow new directories to be created in cylinder groups that have no > free blocks for either the directory itself, or for any files created in > that directory. If this happens, allocating the blocks for the > directory and its files will require ffs_alloc() to do an expensive > search across the cylinder groups for each block. It looks to me like > minbfree needs to equal, or at least a lot closer to avgbfree. > > A similar situation exists with minifree. Please note that the fallback > algorithm uses the condition: > fs->fs_cs(fs, cg).cs_nifree >= avgifree > > > Interesting. We (Vicor) will defer to experts here, but are very willing to test anything you come up with. thanks, k -- Ken Marx, kmarx@vicor-nb.com I insist that we do the right thing and be accountable for the realistic goals. - http://www.bigshed.com/cgi-bin/speak.cgi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FA06772.10409>