From owner-freebsd-fs@FreeBSD.ORG Thu Jan 11 16:15:52 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9349716A403 for ; Thu, 11 Jan 2007 16:15:52 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [64.129.166.50]) by mx1.freebsd.org (Postfix) with ESMTP id 6BAD213C458 for ; Thu, 11 Jan 2007 16:15:52 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.8/8.13.8) with ESMTP id l0BGFjuv063777; Thu, 11 Jan 2007 10:15:45 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <45A662B2.9080801@centtech.com> Date: Thu, 11 Jan 2007 10:15:46 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5.0.9 (X11/20061223) MIME-Version: 1.0 To: Scott Oertel References: <45A3C96A.6030307@scottevil.com> <200701101139.l0ABdJ9K088810@lurza.secnetix.de> <45A485C6.2060405@scottevil.com> <45A5024F.10502@centtech.com> <45A511C0.9000402@scottevil.com> In-Reply-To: <45A511C0.9000402@scottevil.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88.4/2436/Thu Jan 11 05:48:19 2007 on mh1.centtech.com X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=8.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.6 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on mh1.centtech.com Cc: freebsd-fs@freebsd.org Subject: Re: skipping fsck with soft-updates enabled X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2007 16:15:52 -0000 On 01/10/07 10:18, Scott Oertel wrote: > Eric Anderson wrote: >> On 01/10/07 00:20, Scott Oertel wrote: >>> Victor Loureiro Lima wrote: >>>> From rc.conf man page: >>>> --- >>>> background_fsck_delay >>>> (int) The amount of time in seconds to sleep before >>>> starting >>>> a background fsck(8). It defaults to sixty seconds >>>> to allow >>>> large applications such as the X server to start >>>> before disk >>>> I/O bandwidth is monopolized by fsck(8). >>>> --- >>>> >>>> You can set the delay as long as you want, so it wont have to start >>>> right away, in fact it can start as late as a year (if thats really >>>> what you want ;)) >>>> >>>> att, >>>> victor loureiro lima >>>> >>>> 2007/1/10, Oliver Fromme : >>>>> Scott Oertel wrote: >>>>> > I am wondering what kind of problems would occur, besides lost >>>>> space, if >>>>> > after a system crash a fsck is skipped. According to the >>>>> documentation, >>>>> > with soft-updates enabled, the file system would be consistant, >>>>> there >>>>> > would just be lost resources to be recovered which I am assuming >>>>> can be >>>>> > safely done at a later time to avoid long periods of downtime >>>>> during >>>>> > peek hours. >>>>> >>>>> I think that's exactly what the background fsck feature >>>>> does. If you enable it (which is even the default), the >>>>> fsck process doesn' start right away, so the system comes >>>>> up in multi-user mode immediately. Then a snapshot is >>>>> created on the file system, and fsck runs on the snap- >>>>> shot, freeing the lost space in the file system. >>>>> >>>>> Of course, it only works reliably with soft-updates enabled, >>>>> _and_ there must not be any unexpected inconsistencies. >>>>> However, with some common setups (e.g. cheap disks lying >>>>> about completed write operation) it is difficult to >>>>> guarantee the consistency. Soft-updates is rather fragile >>>>> when the hardware doesn't work exactly as it's supposed to. >>>>> I've witnessed breakage in the past, and for that reason >>>>> I always disable the background fsck feature. And it's the >>>>> reason I'm looking forward to gjournal to become stable, >>>>> because it seems to be less fragile in the presence of >>>>> imperfect hardware. >>>>> >>>>> Best regards >>>>> Oliver >>>>> >>>>> -- >>>>> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing >>>>> Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd >>>>> Any opinions expressed in this message may be personal to the author >>>>> and may not necessarily reflect the opinions of secnetix in any way. >>>>> >>>>> "C++ is to C as Lung Cancer is to Lung." >>>>> -- Thomas Funke >>>>> _______________________________________________ >>> The problem with background fsck is that on my machines, it doesn't >>> work well. These machines have 8x750gb SATA drives and they are under >>> extreme stress all the time. When you run fsck in the background each >>> drive takes 10+ minutes to create the snapshot file, during which >>> time the machine is completely unresponsive, and unstable. >> What version of FreeBSD are you running? You might try gjournal, >> which I've had great luck with, and Pawel (pjd@) is incredibly >> responsive to bug reports, etc. >> >>> That is why I am wondering, if it is ok to skip the background >>> fsck's, foreground fsck's and reschedule them for a later time, >>> during non peak hours. >> I think most people would be nervous to tell you 'sure, skip it until >> later', but I can tell you from experience that I myself have delayed >> fscking for weeks on end, to do exactly what you want. >> >> Eric >> >> >> > I'm running on 6.2-RC2. For fun I tried to create a snapshot on one of > our newest machines, same drive config as the previous ones, it's just > less active then the others. It's running 6.2RC2 and it just completely > locked up. Anyway, thanks for the suggestion about running gjournal, i'm > not sure running non-offical patches on the file system code with > production machines is such a great idea. Have you had any problems with > gjournal, if so, of what nature were they? > Honestly, I haven't had many issues with snapshots since 6.1-ish and before. There were lots of deadlocks, livelocks, etc. I think Kris@ has done a bang up job at finding bugs and getting them fixed. If you still see snapshot issues like this, it would be great if you could start sending some info like a ps -auxl, and if it's a deadlock, drop to the debugger and get a crash dump. As far as gjournal, I now have it running on several systems, all very high usage NFS servers (~1000 high end machines pounding them very hard, 24x7). I've only seen a few little issues on one of my systems that is running an older 6-STABLE (it's a little difficult for me to update it right now), but all my other systems have been very solid. PJD has done a great job getting it stable and ready for production use. As far as I have experienced, I have had no data loss, and no file system corruption using it. The worst that's happened is a livelock, followed by a reboot. Since it is indeed journaled, the reboot takes a few minutes, and the fsck takes a few *seconds* (on a 10TB volume). I would say, that using gjournal is more reliable over time, than relying on background fsck's. Gjournal is, however, still in a beta test mode, however you should do your own testing to evaluate it. You can always disable it very easily, without losing your data. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology An undefined problem has an infinite number of solutions. ------------------------------------------------------------------------