From owner-freebsd-fs@FreeBSD.ORG Wed Sep 26 03:30:41 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00E2216A417 for ; Wed, 26 Sep 2007 03:30:41 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (keira.kiwi-computer.com [63.224.10.3]) by mx1.freebsd.org (Postfix) with SMTP id 704A013C447 for ; Wed, 26 Sep 2007 03:30:40 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: (qmail 34470 invoked by uid 2001); 26 Sep 2007 03:03:58 -0000 Date: Tue, 25 Sep 2007 22:03:58 -0500 From: "Rick C. Petty" To: Ivan Voras Message-ID: <20070926030358.GA34186@keira.kiwi-computer.com> References: <46F3A64C.4090507@fluffles.net> <46F3B520.1070708@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Cc: freebsd-fs@freebsd.org Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Sep 2007 03:30:41 -0000 On Fri, Sep 21, 2007 at 02:45:35PM +0200, Ivan Voras wrote: > Stefan Esser wrote: > > From experience (not from reading code or the docs) I conclude that > cylinder groups cannot be larger than around 190 MB. I know this from > numerous runnings of newfs and during development of gvirstor which > interacts with cg in an "interesting" way. Then you didn't run newfs enough: # newfs -N -i 12884901888 /dev/gvinum/mm-flac density reduced from 2147483647 to 3680255 /mm/flac: 196608.0MB (402653184 sectors) block size 16384, fragment size 2048 using 876 cylinder groups of 224.50MB, 14368 blks, 64 inodes. When specifying the -i option to newfs, it will minimize the number of inodes created. If the density option is high enough, it will use only one block of inodes per CG (the minimum).. from there, the density is reduced (as per the message above) and the CG size is increased until the frag bitmap can fit into a single block. With UFS2 and the default options of -b 16384 -f 2048, this gives you 224.50 MB per CG. If you wish to play around with the block/frag sizes, you can greatly increase the CG size: # newfs -N -f 8192 -b 65536 -i 12884901888 /dev/gvinum/mm-flac density reduced from 2147483647 to 14868479 /mm/flac: 196608.0MB (402653184 sectors) block size 65536, fragment size 8192 using 55 cylinder groups of 3628.00MB, 58048 blks, 256 inodes. Doing this is quite appropriate for large disks. This last command means: blocks are allocated in 64k chunks and the minimum allocation size is 8k. Some may say this is wasteful, but one could also argue that using less than 10% of your inodes is also wasteful. > I know the reasons why cgs > exist (mainly to lower latencies from seeking) but with todays drives I don't believe that is true. CGs exist because to prevent complete data loss if the front of the disk is trashed. The blocks and inodes have close proximity partly for lower latency but also to reduce corruption risk. It is suggested that the CG offsets are staggered to make best use of rotational delay but this is obviously irrelevent with modern drives. > and memory configurations it would sometimes be nice to make them larger > or in the extreme, make just one cg that covers the entire drive. And put it in the middle of the drive, not at the front. Gee, this is what NTFS does.. Hmm... There are significant advantages to staggering the CGs across the device (or in the case of some GEOM: providers). Here might be an interesting experiment to try. Write a new version of /usr/src/sbin/newfs/mkfs.c that doesn't have the restriction that the free fragment bitmap resides in one block. I'm not 100% sure if the FFS code would handle it properly, but in theory it should work (the offsets are stored in the superblocks). This is the biggest restriction on the CG size. You should be able to create 2-4 CGs to span each of your 1TB drives without increasing the block size and thus minimum allocation unit. -- -- Rick C. Petty