From owner-freebsd-stable@FreeBSD.ORG  Sat Jul  1 01:14:05 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8850216A403
	for <freebsd-stable@freebsd.org>; Sat,  1 Jul 2006 01:14:05 +0000 (UTC)
	(envelope-from atanas@asd.aplus.net)
Received: from pro20.abac.com (pro20.abac.com [66.226.64.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id ED39B43D4C
	for <freebsd-stable@freebsd.org>; Sat,  1 Jul 2006 01:14:04 +0000 (GMT)
	(envelope-from atanas@asd.aplus.net)
Received: from [216.55.129.5] (asd2.aplus.net [216.55.129.5])
	(authenticated bits=0)
	by pro20.abac.com (8.13.6/8.13.6) with ESMTP id k611E1Ww088665
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-stable@freebsd.org>; Fri, 30 Jun 2006 18:14:01 -0700 (PDT)
	(envelope-from atanas@asd.aplus.net)
Message-ID: <44A5CD8E.3060508@asd.aplus.net>
Date: Fri, 30 Jun 2006 18:19:10 -0700
From: Atanas <atanas@asd.aplus.net>
User-Agent: Thunderbird 1.5.0.4 (Macintosh/20060516)
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.47 (SPF_SOFTFAIL)
Subject: Parallel fsck in non-preen/full mode?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Jul 2006 01:14:05 -0000

Is there some easy way to force a full (non-preen) and at the same time 
parallel (i.e. one process per disk) fsck?

It could be a real down time saver in crash recovery situations. Imagine 
the following (fairly typical in my case) scenario:

You have many machines with some bunch of drives each and many files on 
each drive. After a crash (due to a hardware failure or else), the 
initial preen (fsck -p) fails. You have the following options:

a) rely on the background fsck available for 5.x and up;
b) set fsck_y_enable to YES to do "fsck -y" if the initial preen fails.
c) fsck it manually via local or serial console;

Background fsck relies on snapshots, which don't cope well with user 
quotas and often deadlocks and causes more crashes. Actually the QUOTA + 
snapshots combination worked somewhat better in 5.x than in 6.x now. For 
6.1 it's no longer an option for me.

An "fsck -y" is slow as hell as it doesn't run in parallel. For instance 
6 72GB drives (each about 75% full with a million of files) could take 
good 2 hours, primarily because fsck assumes that interaction is 
required and runs the checks one at a time.

Manual fsck needs attention (additional down time), and the fastest way 
to bring the machine back up is to do exactly the same what a "fsck -p" 
would to, but in _full_ mode, i.e.:

   # fsck -y da0s1a
   # fsck -y da0s1d &
   # fsck -y da1s1d &
   ...
   # fsck -y da7s1d &

   # ps ax |grep fsck
   # ...
   # exit

The above takes just 15 minutes or so, plus the time between the moment 
when the crash actually happens and the moment you start typing on the 
console (which sometimes could be much more than 15 minutes).

This could be automated by putting something similar (plus perhaps some 
shell code taking device entries from /etc/fstab and a cycle waiting for 
the fsck processes to finish) in /etc/rc.early or a separate rc.d/ style 
script. But such a hack I think would look somewhat ugly in shell and 
would just mimic what fsck already does in order to check multiple 
drives when running in preen mode.

It seems that it would be really helpful (and possibly harmless) if fsck 
could be forced to do checks in parallel when running with '-y' when 
console interaction is not needed anyway, or perhaps through a new 
switch (-Y?).

I could try to eventually modify the fsck source and somehow change the 
default '-y' behavior. But I wouldn't like to carry such additional 
luggage of custom patches on all servers and also I don't think that I 
am the most qualified person to do so.

So in case someone still reads this, please advice.

Regards,
Atanas