From owner-freebsd-amd64@FreeBSD.ORG Thu Nov 17 23:32:14 2005 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F60716A420 for ; Thu, 17 Nov 2005 23:32:14 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0167043D46 for ; Thu, 17 Nov 2005 23:32:13 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id jAHNVg2k012578; Fri, 18 Nov 2005 10:31:42 +1100 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id jAHNVe1h007246; Fri, 18 Nov 2005 10:31:40 +1100 Date: Fri, 18 Nov 2005 10:31:39 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ivo Janssen In-Reply-To: <20051117143605.N364@mentat.ivo.nu> Message-ID: <20051118094317.N96862@delplex.bde.org> References: <200511171820.jAHIKJef046199@freefall.freebsd.org> <20051118071624.V96570@delplex.bde.org> <20051117143605.N364@mentat.ivo.nu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-amd64@freebsd.org Subject: Re: amd64/89202: Kernel crash when accessing filesystem X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2005 23:32:14 -0000 On Thu, 17 Nov 2005, Ivo Janssen wrote: > I'm sure you've thought about this, but I can see the following > improvements to be made: > > - make dirsize 64bit - add checks to the multiplication operation to make > sure it doesn't > overflow at runtime > - add logic tunefs and newfs so that user cannot set values > that will lead to kernel panics > - add at the very least huge warnings to the newfs and tunefs > manpages, or mention why their usefulness is limited. I prefer just adding limits to newfs. newfs already enforces other limits. ffs does very little runtime checking except via fsck, and if it ever does more directly it should start with more important parameters. > This particular partition is used for a huge postgres database, which > typically use files holding the actual tables. We assumed tuning the > fs would gain us some improvements... It would be interesting to know if these parameters actually do give improvements. The came with the dirpref changes, and were initially undocumented except in the log message. The log message still documents them much better than the man page. From the log for ffs_alloc 1.55: % The maxcontigdirs is a maximum number of directories which may be created % without an intervening file creation. I found in my tests that the best % performance occurs when I restrict the number of directories in one cylinder % group such that all its files may be located in the same cylinder group. % There may be some deterioration in performance if all the file inodes % are in the same cylinder group as its containing directory, but their % data partially resides in a different cylinder group. The maxcontigdirs % value is calculated to try to prevent this condition. Since there is % no way to know how many files and directories will be allocated later % I added two optimization parameters in superblock/tunefs. They are: % % int32_t fs_avgfilesize; /* expected average file size */ % int32_t fs_avgfpdir; /* expected # of files per directory */ % % These parameters have reasonable defaults but may be tweeked for special % uses of a filesystem. They are only necessary in rare cases like better % tuning a filesystem being used to store a squid cache. So the usefulness of these parameters is limited to cases where there are largish files with frequent inode updates, where tuning prevents the inodes being in different cylinder groups than the data, and where the reduction in seeks from this is actually significant, i.e., where the cylinder groups aren't so large that seeks within them aren't almost as slow as inter-cg seeks and where the working set consists of only 1 cg. I doubt that there are many such cases. You either have a small working set which is fast enough to access because it is small, or a larger one which will require large seeks to access. Also, settings with the product larger than the size of a cylinder group are not useful; the size of a cg is also int32_t so newfs just needs to check that the size of the produce doesn't exceed the size of a cg. Bruce