From owner-freebsd-fs@FreeBSD.ORG Wed Oct 29 01:01:06 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A1CB216A4D0 for ; Wed, 29 Oct 2003 01:01:06 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5024944084 for ; Wed, 29 Oct 2003 01:00:24 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id h9T8xWeF028514; Wed, 29 Oct 2003 00:59:40 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200310290859.h9T8xWeF028514@gw.catspoiler.org> Date: Wed, 29 Oct 2003 00:59:32 -0800 (PST) From: Don Lewis To: kmarx@vicor.com In-Reply-To: <3F9F26DB.6050207@vicor.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-fs@FreeBSD.org cc: julian@elischer.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2003 09:01:06 -0000 On 28 Oct, Ken Marx wrote: > > > Kirk McKusick wrote: >> I does look like the hash function is having some trouble. >> It has been completely revamped in 5.0, but is still using >> a "power-of-2" hashing scheme in 4.X. I highly recommend >> trying a scheme with non-power-of-2 base. Perhaps something >> as simple as changing the hashing to use modulo rather than >> logical & (e.g., in bufhash change from & bufhashmask to >> % bufhashmask). >> >> Kirk McKusick >> >> > We have a sample 'fix' for the hashtable in vfs_bio.c > that uses all the blkno bits. It's in the diff link above. > Use as you see fit. However, it too doesn't really address > our symptoms significantly. Darn. > Bogging down to 1Mb/sec and > 90% system seen. A Fibonacci hash, like I implemented in the kern/kern_mtxpool.c 1.8, might be a good choice here, since it tends to distribute the keys fairly uniformly. I think this is a secondary issue, though. I think the real problem is the following code in ffs_dirpref(): avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg; [snip] maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg); minifree = avgifree - fs->fs_ipg / 4; if (minifree < 0) minifree = 0; minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4; if (minbfree < 0) minbfree = 0; [snip] prefcg = ino_to_cg(fs, pip->i_number); for (cg = prefcg; cg < fs->fs_ncg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < maxndir && fs->fs_cs(fs, cg).cs_nifree >= minifree && fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { if (fs->fs_contigdirs[cg] < maxcontigdirs) return ((ino_t)(fs->fs_ipg * cg)); } for (cg = 0; cg < prefcg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < maxndir && fs->fs_cs(fs, cg).cs_nifree >= minifree && fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { if (fs->fs_contigdirs[cg] < maxcontigdirs) return ((ino_t)(fs->fs_ipg * cg)); } If the file system is more than 75% full, minbfree will be zero, which will allow new directories to be created in cylinder groups that have no free blocks for either the directory itself, or for any files created in that directory. If this happens, allocating the blocks for the directory and its files will require ffs_alloc() to do an expensive search across the cylinder groups for each block. It looks to me like minbfree needs to equal, or at least a lot closer to avgbfree. A similar situation exists with minifree. Please note that the fallback algorithm uses the condition: fs->fs_cs(fs, cg).cs_nifree >= avgifree