From owner-freebsd-fs@FreeBSD.ORG Fri Jul 31 13:51:45 2009 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B760C1065689 for ; Fri, 31 Jul 2009 13:51:45 +0000 (UTC) (envelope-from bf1783@googlemail.com) Received: from mail-fx0-f210.google.com (mail-fx0-f210.google.com [209.85.220.210]) by mx1.freebsd.org (Postfix) with ESMTP id 3BCA78FC2B for ; Fri, 31 Jul 2009 13:51:45 +0000 (UTC) (envelope-from bf1783@googlemail.com) Received: by fxm6 with SMTP id 6so157470fxm.43 for ; Fri, 31 Jul 2009 06:51:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=bs3ATdoJfJLZzaAfLz1Sbb8lczyx/rDMuuo6u5N89SQ=; b=EfLsFd3NOwR6hNsNAfA/eC059qa7ArCuJWMcQhBa2I/dHDVFByXHui/6Lin8MVhZ3A 7uCTi+9gHiETyS8xGHb3RoT5k/nDTIuhzCk0H2qr2KmPHZhqtECNF6+oj13dc5Q+ZO05 Vkm3j06Rmi4/ceocrAZ/YYaKJT+Ci4f0RD5xE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=kNoDGyjtik9DR22jJWoRRCRC9HcSLgkWw5+lPyH9aBR8VO/PJmFB6YnYxZPu1vbGsd PSqpOsyoghh0GWlOCD6w0VIkR1YFq3yituTJj4i225HtL1BhgwM/eadCJIaJkxcAUPL0 WrsTVGr9aFCouvdY14uhZVWfXxxyinlKCmEdw= MIME-Version: 1.0 Received: by 10.239.167.212 with SMTP id h20mr241375hbe.68.1249046977816; Fri, 31 Jul 2009 06:29:37 -0700 (PDT) In-Reply-To: <26ddd1750907260719x761a1c94r27c572ab1ff6a582@mail.gmail.com> References: <26ddd1750907260719x761a1c94r27c572ab1ff6a582@mail.gmail.com> Date: Fri, 31 Jul 2009 13:29:37 +0000 Message-ID: From: "b. f." To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Maxim Khitrov Subject: Re: UFS2 tuning for heterogeneous 4TB file system X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jul 2009 13:51:46 -0000 On 7/26/09, Maxim Khitrov wrote: > On Sun, Jul 26, 2009 at 3:56 AM, b. f. wrote: >>>The file system in question will not have a common file size (which is >>>what, as I understand, bytes per inode should be tuned for). There >>>will be many small files (< 10 KB) and many large ones (> 500 MB). A >>>similar, in terms of content, 2TB ntfs file system on another server >>>has an average file size of about 26 MB with 59,246 files. >> >> Ordinarily, it may have a large variation in file sizes, but can you >> intervene, and segregate large and small files in separate >> filesystems, so that you can optimize the settings for each >> independently? > > That's a good idea, but the problem is that this raid array will grow > in the future as I add additional drives. As far as I know, a > partition can be expanded using growfs, but it cannot be moved to a > higher address (with any "standard" tools). So if I create two > separate partitions for different file types, the first partition will > have to remain a fixed size. That would be problematic, since I cannot > easily predict how much space it would need initially and for all > future purposes (enough to store all the files, yet not waste space > that could otherwise be used for the second partition). > Perhaps gconcat(8), gmirror(8), or vinum(4) will solve your problem here. I think there are other tools as well. >>>Ideally, I would prefer that small files do not waste more than 4 KB >>>of space, which is what you have with ntfs. At the same time, having >>>fsck running for days after an unclean shutdown is also not a good >>>option (I always disable background checking). From what I've gathered >>>so far, the two requirements are at the opposite ends in terms of file >>>system optimization. >> >> I gather you are trying to be conservative, but have you considered >> using gjournal(8)? At least for the filesystems with many small >> files? In that way, you could safely avoid the need for most if not >> all use of fsck(8), and, as an adjunct benefit, you would be able to >> operate on the small files more quickly: >> >> http://lists.freebsd.org/pipermail/freebsd-current/2006-June/064043.html >> http://www.freebsd.org/doc/en_US.ISO8859-1/articles/gjournal-desktop/article.html >> >> gjournal has a lower overhead than ZFS, and has proven to be fairly >> reliable. Also, you can always unhook it and revert to plain UFS >> mounts easily. >> >> b. >> > > Just fairly reliable? :) > Well, I'm not going to promise the sun, the moon, and the stars. It has worked for me (better than softupdates, I might add) under my more modest workloads. > I've done a bit of reading on gjournal and the main thing that's > preventing me from using it is the recency of implementation. I've had > a number of FreeBSD servers go down in the past due to power outages > and SoftUpdates with foreground fsck have never failed me. I have > never had a corrupt ufs2 partition, which is not something I can say > about a few linux servers with ext3. > > Have there been any serious studies into how gjournal and SU deal with > power outages? By that I mean taking two identical machines, issuing > write operations, yanking the power cords, and then watching both > systems recover? I'm sure that gjournal will take less time to reboot, > but if this experiment is repeated a few hundred times I wonder what > the corruption statistics would be. Is there ever a case, for > instance, when the journal itself becomes corrupt because the power > was pulled in the middle of a metadata flush? > I'm not aware of any such tests, but I wouldn't be surprised if pjd@ or someone else who was interested in using gjournal(8) in a demanding environment had made some. I'll cc freebsd-fs@, because some of them may not monitor freebsd-questions. Perhaps someone there has some advice. You might also try asking on freebsd-geom@. Regards, b. > Basically, I have no experience with gjournal, poor experience with > other journaled file systems, and no real comparison between > reliability characteristics of gjournal and SoftUpdates, which have > served me very well in the past. > > - Max >