From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 26 03:30:41 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00E2216A417
	for <freebsd-fs@freebsd.org>; Wed, 26 Sep 2007 03:30:41 +0000 (UTC)
	(envelope-from rick@kiwi-computer.com)
Received: from kiwi-computer.com (keira.kiwi-computer.com [63.224.10.3])
	by mx1.freebsd.org (Postfix) with SMTP id 704A013C447
	for <freebsd-fs@freebsd.org>; Wed, 26 Sep 2007 03:30:40 +0000 (UTC)
	(envelope-from rick@kiwi-computer.com)
Received: (qmail 34470 invoked by uid 2001); 26 Sep 2007 03:03:58 -0000
Date: Tue, 25 Sep 2007 22:03:58 -0500
From: "Rick C. Petty" <rick-freebsd@kiwi-computer.com>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <20070926030358.GA34186@keira.kiwi-computer.com>
References: <46F3A64C.4090507@fluffles.net> <fd0aaj$poh$1@sea.gmane.org>
	<46F3B520.1070708@FreeBSD.org> <fd0edf$7jd$1@sea.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <fd0edf$7jd$1@sea.gmane.org>
User-Agent: Mutt/1.4.2.1i
Cc: freebsd-fs@freebsd.org
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: rick-freebsd@kiwi-computer.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Sep 2007 03:30:41 -0000

On Fri, Sep 21, 2007 at 02:45:35PM +0200, Ivan Voras wrote:
> Stefan Esser wrote:
> 
> From experience (not from reading code or the docs) I conclude that 
> cylinder groups cannot be larger than around 190 MB. I know this from 
> numerous runnings of newfs and during development of gvirstor which 
> interacts with cg in an "interesting" way.

Then you didn't run newfs enough:

# newfs -N -i 12884901888 /dev/gvinum/mm-flac
density reduced from 2147483647 to 3680255
/mm/flac: 196608.0MB (402653184 sectors) block size 16384, fragment size 2048
        using 876 cylinder groups of 224.50MB, 14368 blks, 64 inodes.

When specifying the -i option to newfs, it will minimize the number of
inodes created.  If the density option is high enough, it will use only one
block of inodes per CG (the minimum)..  from there, the density is reduced
(as per the message above) and the CG size is increased until the frag
bitmap can fit into a single block.  With UFS2 and the default options of
-b 16384 -f 2048, this gives you 224.50 MB per CG.

If you wish to play around with the block/frag sizes, you can greatly
increase the CG size:

# newfs -N -f 8192 -b 65536 -i 12884901888 /dev/gvinum/mm-flac
density reduced from 2147483647 to 14868479
/mm/flac: 196608.0MB (402653184 sectors) block size 65536, fragment size 8192
        using 55 cylinder groups of 3628.00MB, 58048 blks, 256 inodes.

Doing this is quite appropriate for large disks.  This last command means:
blocks are allocated in 64k chunks and the minimum allocation size is 8k.
Some may say this is wasteful, but one could also argue that using less
than 10% of your inodes is also wasteful.

> I know the reasons why cgs 
> exist (mainly to lower latencies from seeking) but with todays drives 

I don't believe that is true.  CGs exist because to prevent complete data
loss if the front of the disk is trashed.  The blocks and inodes have close
proximity partly for lower latency but also to reduce corruption risk.
It is suggested that the CG offsets are staggered to make best use of
rotational delay but this is obviously irrelevent with modern drives.

> and memory configurations it would sometimes be nice to make them larger 
> or in the extreme, make just one cg that covers the entire drive. 

And put it in the middle of the drive, not at the front.  Gee, this is what
NTFS does..  Hmm...

There are significant advantages to staggering the CGs across the device
(or in the case of some GEOM: providers).

Here might be an interesting experiment to try.  Write a new version of
/usr/src/sbin/newfs/mkfs.c that doesn't have the restriction that the free
fragment bitmap resides in one block.  I'm not 100% sure if the FFS code
would handle it properly, but in theory it should work (the offsets are
stored in the superblocks).  This is the biggest restriction on the CG
size.  You should be able to create 2-4 CGs to span each of your 1TB
drives without increasing the block size and thus minimum allocation unit.

-- 


-- Rick C. Petty