From owner-freebsd-fs@FreeBSD.ORG Sun Nov 13 17:17:37 2005 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6380116A420 for ; Sun, 13 Nov 2005 17:17:37 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id A64FE43D72 for ; Sun, 13 Nov 2005 17:17:26 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (junior.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id jADHH8SP053474; Sun, 13 Nov 2005 10:17:08 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <43777523.8020709@samsco.org> Date: Sun, 13 Nov 2005 10:17:23 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050615 X-Accept-Language: en-us, en MIME-Version: 1.0 To: delphij@delphij.net References: <436BDB99.5060907@samsco.org> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on pooker.samsco.org Cc: freebsd-fs@freebsd.org, user Subject: Re: UFS2 snapshots on large filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Nov 2005 17:17:37 -0000 Xin LI wrote: > On 11/5/05, Scott Long wrote: > >>The UFS snapshot code was written at a time when disks were typically >>around 4-9GB in size, not 400GB in size =-) Unfortunately, the amount > > > s/size/cylinder groups/g :-) > > >>of time it takes to do the initial snapshot bookkeeping scales linearly >>with the size of the drive, and many people have reported that it takes >>considerable amount of time (anywhere from several minutes to several >>dozen minutes) on large drives/arrays like you describe. So, you should >>test and plan accordingly if you are interested in using them. > > > I have some ideas about lazy snapshotting. But unfortunately I don't > have much time to implement a prototype ATM, and I think we really > need a file system that is capable for: > - Handling large number of files in one directory (say, some sort of > indexing mechanism, etc. And yes, I know that this is somewhat > insane, but the [ab]use is present in many large e-mail systems that > uses mailbox) > - Effective recovery. Personally I do not buy journalling much, and > I think the problem could be resolved by something like WAFL did. > > I think that JUFS would provide some help for (2), do you have some > plan about (1)? > I guess that UFS_DIRHASH doesn't give enough benefit for your situation? The idea of doing alternate directory layouts (such as b-trees) has been proposed a number of times. Apparently there was an idea at one point for UFS to generate a b-tree layout for directory and and save it on disk as a cache. The primary method of directory storage would remain the traditional linear way so that compatibility is preserved, but OS's that were aware of the cache could use it too. There are still some reserved flags and fields in UFS2 for doing this, in case you're interested. Since it requires double bookkeeping for link creation and removal, I'm not sure how speedy it is for anything other than VOP_LOOKUP operations. An alternate idea I've had is to break with compatibility and doing b-trees or something similar as the native format for UFS3 (along with native journalling and other things). Scott