From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 12:57:53 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EBD2616A4B3 for ; Wed, 22 Oct 2003 12:57:53 -0700 (PDT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6065143F93 for ; Wed, 22 Oct 2003 12:57:53 -0700 (PDT) (envelope-from julian@vicor.com) Received: by mail.vicor-nb.com (Postfix, from userid 1058) id 27C707A49F; Wed, 22 Oct 2003 12:57:53 -0700 (PDT) To: freebsd-fs@freebsd.org, kmarx@vicor.com, mckusick@mckusick.com In-Reply-To: <3F95D3F3.2050203@vicor.com> Message-Id: <20031022195753.27C707A49F@mail.vicor-nb.com> Date: Wed, 22 Oct 2003 12:57:53 -0700 (PDT) From: julian@vicor.com (Julian Elischer) cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: VicPE@aol.com cc: jpl@vicor.com cc: jrh@vicor.com cc: davep@vicor.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 19:57:54 -0000 Kirk?, I'm away in easwtern europe on the end of a wet bit of string.... >From kmarx@vicor.com Tue Oct 21 17:53:42 2003 X-Original-To: julian@vicor-nb.com Delivered-To: julian@vicor-nb.com Date: Tue, 21 Oct 2003 17:48:51 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-fs@freebsd.org Cc: Julian Elischer , John Lynch , Dave Parker Smith , Cayford Burrell , victor elischer , Josh Howard , Ken Marx Subject: 4.8 ffs_dirpref problem Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hi, We have 560GB raids that were sometimes bogging down heavily in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4) we find that when: o the raid file system grows to over 85% capacity (with only 30% inode usage) o we create ~1500 or so 2-6kb files in a given dir o (note: soft updates NOT enabled) We see: o 100% cpu utilization, all in system o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s We profiled the kernel and found a large number of calls to ffs_alloc(). After many twisty pasages, we finally diff'd 4.4 with 4.8 ffs_alloc.c, and found a major difference in the ffs_dirpref() call. Hacking the 4.4 logic back in 'fixed' the problem: We can now fill the /raid entirely with no real noticeable performance degradation. The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain things fairly clearly: 4.4 - ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps: -------------------------------------- * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. 4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon: ----------------------------------------- * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. For us, the 4.4 policy seems far superior, at least when the file system approches capacity. We'd like to avoid local kernel hacks and keep with main line FreeBSD code. Is there some way that the old policy can be supported, perhaps via a tunefs or sysctl type option? Actually, if the new policy can be fixed up to avoid the problem, that would of course be just as dandy. Thanks very much, k -- Ken Marx, kmarx@vicor-nb.com We need to hit the nail on the head and set the agenda regarding total quality. - http://www.bigshed.com/cgi-bin/speak.cgi