From owner-freebsd-questions@FreeBSD.ORG Sun Jul 26 14:20:05 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04A1F106564A for ; Sun, 26 Jul 2009 14:20:05 +0000 (UTC) (envelope-from mkhitrov@gmail.com) Received: from mail-yx0-f181.google.com (mail-yx0-f181.google.com [209.85.210.181]) by mx1.freebsd.org (Postfix) with ESMTP id B174A8FC13 for ; Sun, 26 Jul 2009 14:20:04 +0000 (UTC) (envelope-from mkhitrov@gmail.com) Received: by yxe11 with SMTP id 11so4639106yxe.3 for ; Sun, 26 Jul 2009 07:20:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=Oj/eQipampxZNCTjdxmn+hp3tfSaDylZbJMNpACNr44=; b=VVrnJ1APRYLCy9Zg4xtGhCsPLO6qlActyhP/uZJ6NNYHl6HMfQ+6L0StK0UjKka57j dMcj0WlZ3qDtaJZ55y03X1CJCPiU9RrsxrVaTNaL0r2fakHRA4rLzxaZcv1wAt4HB6Rg ySNwT+HmhXAqQHiV0XP+RzhhY+SnRWwxm3Wsc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=qWqU6x3H/pOz6wxt/eDkKsW1T0VIyhl9nUCAEqnI7xUNGPN+Mf3o74XSxc+cDtO1vF 8nX6Ro7/Ri9E2/KfEAH2Z4FWSTQKLWlm47PvRCbk5Gue/hNN40cLn4a8rtM3+l6TK4fk D5kwGcc2geu4s15eMcS2MkM55/atyP7nHLRF0= MIME-Version: 1.0 Received: by 10.90.94.2 with SMTP id r2mr5005792agb.19.1248618004114; Sun, 26 Jul 2009 07:20:04 -0700 (PDT) In-Reply-To: References: From: Maxim Khitrov Date: Sun, 26 Jul 2009 10:19:44 -0400 Message-ID: <26ddd1750907260719x761a1c94r27c572ab1ff6a582@mail.gmail.com> To: "b. f." Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-questions@freebsd.org Subject: Re: UFS2 tuning for heterogeneous 4TB file system X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jul 2009 14:20:05 -0000 On Sun, Jul 26, 2009 at 3:56 AM, b. f. wrote: >>The file system in question will not have a common file size (which is >>what, as I understand, bytes per inode should be tuned for). There >>will be many small files (< 10 KB) and many large ones (> 500 MB). A >>similar, in terms of content, 2TB ntfs file system on another server >>has an average file size of about 26 MB with 59,246 files. > > Ordinarily, it may have a large variation in file sizes, =C2=A0but can yo= u > intervene, and segregate large and small files in separate > filesystems, so that you can optimize the settings for each > independently? That's a good idea, but the problem is that this raid array will grow in the future as I add additional drives. As far as I know, a partition can be expanded using growfs, but it cannot be moved to a higher address (with any "standard" tools). So if I create two separate partitions for different file types, the first partition will have to remain a fixed size. That would be problematic, since I cannot easily predict how much space it would need initially and for all future purposes (enough to store all the files, yet not waste space that could otherwise be used for the second partition). >>Ideally, I would prefer that small files do not waste more than 4 KB >>of space, which is what you have with ntfs. At the same time, having >>fsck running for days after an unclean shutdown is also not a good >>option (I always disable background checking). From what I've gathered >>so far, the two requirements are at the opposite ends in terms of file >>system optimization. > > I gather you are trying to be conservative, but have you considered > using gjournal(8)? =C2=A0At least for the filesystems with many small > files? =C2=A0In that way, you could safely avoid the need for most if not > all use of fsck(8), and, as an adjunct benefit, you would be able to > operate on the small files more quickly: > > http://lists.freebsd.org/pipermail/freebsd-current/2006-June/064043.html > http://www.freebsd.org/doc/en_US.ISO8859-1/articles/gjournal-desktop/arti= cle.html > > gjournal has a lower overhead than ZFS, and has proven to be fairly > reliable. =C2=A0Also, you can always unhook it and revert to plain UFS > mounts easily. > > b. > Just fairly reliable? :) I've done a bit of reading on gjournal and the main thing that's preventing me from using it is the recency of implementation. I've had a number of FreeBSD servers go down in the past due to power outages and SoftUpdates with foreground fsck have never failed me. I have never had a corrupt ufs2 partition, which is not something I can say about a few linux servers with ext3. Have there been any serious studies into how gjournal and SU deal with power outages? By that I mean taking two identical machines, issuing write operations, yanking the power cords, and then watching both systems recover? I'm sure that gjournal will take less time to reboot, but if this experiment is repeated a few hundred times I wonder what the corruption statistics would be. Is there ever a case, for instance, when the journal itself becomes corrupt because the power was pulled in the middle of a metadata flush? Basically, I have no experience with gjournal, poor experience with other journaled file systems, and no real comparison between reliability characteristics of gjournal and SoftUpdates, which have served me very well in the past. - Max