Date: Wed, 29 Oct 2003 00:59:32 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: kmarx@vicor.com Cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem Message-ID: <200310290859.h9T8xWeF028514@gw.catspoiler.org> In-Reply-To: <3F9F26DB.6050207@vicor.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28 Oct, Ken Marx wrote: > > > Kirk McKusick wrote: >> I does look like the hash function is having some trouble. >> It has been completely revamped in 5.0, but is still using >> a "power-of-2" hashing scheme in 4.X. I highly recommend >> trying a scheme with non-power-of-2 base. Perhaps something >> as simple as changing the hashing to use modulo rather than >> logical & (e.g., in bufhash change from & bufhashmask to >> % bufhashmask). >> >> Kirk McKusick >> >> > We have a sample 'fix' for the hashtable in vfs_bio.c > that uses all the blkno bits. It's in the diff link above. > Use as you see fit. However, it too doesn't really address > our symptoms significantly. Darn. > Bogging down to 1Mb/sec and > 90% system seen. A Fibonacci hash, like I implemented in the kern/kern_mtxpool.c 1.8, might be a good choice here, since it tends to distribute the keys fairly uniformly. I think this is a secondary issue, though. I think the real problem is the following code in ffs_dirpref(): avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg; [snip] maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg); minifree = avgifree - fs->fs_ipg / 4; if (minifree < 0) minifree = 0; minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4; if (minbfree < 0) minbfree = 0; [snip] prefcg = ino_to_cg(fs, pip->i_number); for (cg = prefcg; cg < fs->fs_ncg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < maxndir && fs->fs_cs(fs, cg).cs_nifree >= minifree && fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { if (fs->fs_contigdirs[cg] < maxcontigdirs) return ((ino_t)(fs->fs_ipg * cg)); } for (cg = 0; cg < prefcg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < maxndir && fs->fs_cs(fs, cg).cs_nifree >= minifree && fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { if (fs->fs_contigdirs[cg] < maxcontigdirs) return ((ino_t)(fs->fs_ipg * cg)); } If the file system is more than 75% full, minbfree will be zero, which will allow new directories to be created in cylinder groups that have no free blocks for either the directory itself, or for any files created in that directory. If this happens, allocating the blocks for the directory and its files will require ffs_alloc() to do an expensive search across the cylinder groups for each block. It looks to me like minbfree needs to equal, or at least a lot closer to avgbfree. A similar situation exists with minifree. Please note that the fallback algorithm uses the condition: fs->fs_cs(fs, cg).cs_nifree >= avgifree
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200310290859.h9T8xWeF028514>