From owner-freebsd-fs@FreeBSD.ORG Thu Jan 11 18:17:48 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E19F316A529 for ; Thu, 11 Jan 2007 18:17:48 +0000 (UTC) (envelope-from freebsd@scottevil.com) Received: from relay.aplus.net (relay.aplus.net [216.55.128.212]) by mx1.freebsd.org (Postfix) with ESMTP id CB57913C4BF for ; Thu, 11 Jan 2007 18:17:48 +0000 (UTC) (envelope-from freebsd@scottevil.com) Received: from [216.55.129.230] by relay.aplus.net with esmtp (Exim 4.60 (FreeBSD)) (envelope-from ) id 1H54Um-0001Z1-EN; Thu, 11 Jan 2007 10:17:48 -0800 Message-ID: <45A67F44.3000109@scottevil.com> Date: Thu, 11 Jan 2007 10:17:40 -0800 From: Scott Oertel User-Agent: Thunderbird 2.0b1 (X11/20061211) MIME-Version: 1.0 To: Eric Anderson , freebsd-fs@freebsd.org References: <45A3C96A.6030307@scottevil.com> <200701101139.l0ABdJ9K088810@lurza.secnetix.de> <45A485C6.2060405@scottevil.com> <45A5024F.10502@centtech.com> <45A511C0.9000402@scottevil.com> <45A662B2.9080801@centtech.com> In-Reply-To: <45A662B2.9080801@centtech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: skipping fsck with soft-updates enabled X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2007 18:17:49 -0000 Eric Anderson wrote: > On 01/10/07 10:18, Scott Oertel wrote: >> Eric Anderson wrote: >>> On 01/10/07 00:20, Scott Oertel wrote: >>>> Victor Loureiro Lima wrote: >>>>> From rc.conf man page: >>>>> --- >>>>> background_fsck_delay >>>>> (int) The amount of time in seconds to sleep >>>>> before starting >>>>> a background fsck(8). It defaults to sixty >>>>> seconds to allow >>>>> large applications such as the X server to start >>>>> before disk >>>>> I/O bandwidth is monopolized by fsck(8). >>>>> --- >>>>> >>>>> You can set the delay as long as you want, so it wont have to start >>>>> right away, in fact it can start as late as a year (if thats really >>>>> what you want ;)) >>>>> >>>>> att, >>>>> victor loureiro lima >>>>> >>>>> 2007/1/10, Oliver Fromme : >>>>>> Scott Oertel wrote: >>>>>> > I am wondering what kind of problems would occur, besides lost >>>>>> space, if >>>>>> > after a system crash a fsck is skipped. According to the >>>>>> documentation, >>>>>> > with soft-updates enabled, the file system would be >>>>>> consistant, there >>>>>> > would just be lost resources to be recovered which I am >>>>>> assuming can be >>>>>> > safely done at a later time to avoid long periods of downtime >>>>>> during >>>>>> > peek hours. >>>>>> >>>>>> I think that's exactly what the background fsck feature >>>>>> does. If you enable it (which is even the default), the >>>>>> fsck process doesn' start right away, so the system comes >>>>>> up in multi-user mode immediately. Then a snapshot is >>>>>> created on the file system, and fsck runs on the snap- >>>>>> shot, freeing the lost space in the file system. >>>>>> >>>>>> Of course, it only works reliably with soft-updates enabled, >>>>>> _and_ there must not be any unexpected inconsistencies. >>>>>> However, with some common setups (e.g. cheap disks lying >>>>>> about completed write operation) it is difficult to >>>>>> guarantee the consistency. Soft-updates is rather fragile >>>>>> when the hardware doesn't work exactly as it's supposed to. >>>>>> I've witnessed breakage in the past, and for that reason >>>>>> I always disable the background fsck feature. And it's the >>>>>> reason I'm looking forward to gjournal to become stable, >>>>>> because it seems to be less fragile in the presence of >>>>>> imperfect hardware. >>>>>> >>>>>> Best regards >>>>>> Oliver >>>>>> >>>>>> -- >>>>>> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing >>>>>> Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd >>>>>> Any opinions expressed in this message may be personal to the author >>>>>> and may not necessarily reflect the opinions of secnetix in any way. >>>>>> >>>>>> "C++ is to C as Lung Cancer is to Lung." >>>>>> -- Thomas Funke >>>>>> _______________________________________________ >>>> The problem with background fsck is that on my machines, it doesn't >>>> work well. These machines have 8x750gb SATA drives and they are >>>> under extreme stress all the time. When you run fsck in the >>>> background each drive takes 10+ minutes to create the snapshot >>>> file, during which time the machine is completely unresponsive, and >>>> unstable. >>> What version of FreeBSD are you running? You might try gjournal, >>> which I've had great luck with, and Pawel (pjd@) is incredibly >>> responsive to bug reports, etc. >>> >>>> That is why I am wondering, if it is ok to skip the background >>>> fsck's, foreground fsck's and reschedule them for a later time, >>>> during non peak hours. >>> I think most people would be nervous to tell you 'sure, skip it >>> until later', but I can tell you from experience that I myself have >>> delayed fscking for weeks on end, to do exactly what you want. >>> >>> Eric >>> >>> >>> >> I'm running on 6.2-RC2. For fun I tried to create a snapshot on one >> of our newest machines, same drive config as the previous ones, it's >> just less active then the others. It's running 6.2RC2 and it just >> completely locked up. Anyway, thanks for the suggestion about running >> gjournal, i'm not sure running non-offical patches on the file system >> code with production machines is such a great idea. Have you had any >> problems with gjournal, if so, of what nature were they? >> > > > Honestly, I haven't had many issues with snapshots since 6.1-ish and > before. There were lots of deadlocks, livelocks, etc. I think Kris@ > has done a bang up job at finding bugs and getting them fixed. If you > still see snapshot issues like this, it would be great if you could > start sending some info like a ps -auxl, and if it's a deadlock, drop > to the debugger and get a crash dump. What size are the hard drives you're creating snapshots of? is it > 750gb? If it is then I would be happy to find a resolution for the snapshot issue by providing debug info and such. > > As far as gjournal, I now have it running on several systems, all very > high usage NFS servers (~1000 high end machines pounding them very > hard, 24x7). I've only seen a few little issues on one of my systems > that is running an older 6-STABLE (it's a little difficult for me to > update it right now), but all my other systems have been very solid. > PJD has done a great job getting it stable and ready for production > use. As far as I have experienced, I have had no data loss, and no > file system corruption using it. The worst that's happened is a > livelock, followed by a reboot. Since it is indeed journaled, the > reboot takes a few minutes, and the fsck takes a few *seconds* (on a > 10TB volume). I would say, that using gjournal is more reliable over > time, than relying on background fsck's. Gjournal is, however, still > in a beta test mode, however you should do your own testing to > evaluate it. You can always disable it very easily, without losing > your data. > > Eric > > > I'll go ahead and give gjournal a test run on a test machine, and see how I like it. Thank you for the information based on your experiences with it. -Scott