From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 22:30:04 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4814716A4B3 for ; Wed, 22 Oct 2003 22:30:04 -0700 (PDT) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7578A43FAF for ; Wed, 22 Oct 2003 22:30:03 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h9MNbseN005704; Wed, 22 Oct 2003 16:37:54 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200310222337.h9MNbseN005704@beastie.mckusick.com> To: Ken Marx In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT." <20031022195753.27C707A49F@mail.vicor-nb.com> Date: Wed, 22 Oct 2003 16:37:54 -0700 From: Kirk McKusick X-Mailman-Approved-At: Thu, 23 Oct 2003 06:57:01 -0700 cc: freebsd-fs@freebsd.org cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: julian@vicor.com cc: VicPE@aol.com cc: jpl@vicor.com cc: Grigoriy Orlov cc: jrh@vicor.com cc: davep@vicor.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 05:30:04 -0000 I believe that you can dsolve your problem by tuning the existing algorithm using tunefs. There are two parameters to control dirpref, avgfilesize (which defaults to 16384) and filesperdir (which defaults to 50). I suggest that you try using an avgfilesize of 4096 and filesperdir of 1500. This is done by running tunefs on the unmounted (or at least mounted read-only) filesystem as: tunefs -f 4096 -s 1500 /dev/ Note that this affects future layout, so needs to be done before you put any data into the filesystem. If you are building the filesystem from scratch, you can use: newfs -g 4096 -h 1500 ... to set these fields. Please let me know if this solves your problem. If it does not, I will ask Grigoriy Orlov if he has any ideas on how to proceed. Kirk McKusick =-=-=-=-=-=-= > Date: Tue, 21 Oct 2003 17:48:51 -0700 > From: Ken Marx > To: freebsd-fs@freebsd.org > Cc: Julian Elischer , > John Lynch , Dave Parker Smith , > Cayford Burrell , > victor elischer , Josh Howard , > Ken Marx > Subject: 4.8 ffs_dirpref problem > > Hi, > > We have 560GB raids that were sometimes bogging down heavily > in our production systems. Under 4.8-RELEASE (recently > upgrated from 4.4) we find that when: > > o the raid file system grows to over 85% capacity (with only > 30% inode usage) > o we create ~1500 or so 2-6kb files in a given dir > o (note: soft updates NOT enabled) > > We see: > > o 100% cpu utilization, all in system > o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s > > We profiled the kernel and found a large number of calls to > ffs_alloc(). After many twisty pasages, we finally diff'd 4.4 > with 4.8 ffs_alloc.c, and found a major difference in the > ffs_dirpref() call. Hacking the 4.4 logic back in 'fixed' the > problem: We can now fill the /raid entirely with no real > noticeable performance degradation. > > The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain > things fairly clearly: > > 4.4 - ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps: > -------------------------------------- > * The policy implemented by this algorithm is to select from > * among those cylinder groups with above the average number of > * free inodes, the one with the smallest number of directories. > > 4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon: > ----------------------------------------- > * The policy implemented by this algorithm is to allocate a > * directory inode in the same cylinder group as its parent > * directory, but also to reserve space for its files inodes > * and data. Restrict the number of directories which may be > * allocated one after another in the same cylinder group > * without intervening allocation of files. > * > * If we allocate a first level directory then force allocation > * in another cylinder group. > > For us, the 4.4 policy seems far superior, at least when the file system > approches capacity. > > We'd like to avoid local kernel hacks and keep with main line > FreeBSD code. Is there some way that the old policy can be supported, > perhaps via a tunefs or sysctl type option? > > Actually, if the new policy can be fixed up to avoid the problem, that > would of course be just as dandy. > > Thanks very much, > k > -- > Ken Marx, kmarx@vicor-nb.com > We need to hit the nail on the head and set the agenda regarding total > quality. > - http://www.bigshed.com/cgi-bin/speak.cgi