From owner-freebsd-stable@FreeBSD.ORG Sat Jul 1 01:14:05 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8850216A403 for ; Sat, 1 Jul 2006 01:14:05 +0000 (UTC) (envelope-from atanas@asd.aplus.net) Received: from pro20.abac.com (pro20.abac.com [66.226.64.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id ED39B43D4C for ; Sat, 1 Jul 2006 01:14:04 +0000 (GMT) (envelope-from atanas@asd.aplus.net) Received: from [216.55.129.5] (asd2.aplus.net [216.55.129.5]) (authenticated bits=0) by pro20.abac.com (8.13.6/8.13.6) with ESMTP id k611E1Ww088665 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 30 Jun 2006 18:14:01 -0700 (PDT) (envelope-from atanas@asd.aplus.net) Message-ID: <44A5CD8E.3060508@asd.aplus.net> Date: Fri, 30 Jun 2006 18:19:10 -0700 From: Atanas User-Agent: Thunderbird 1.5.0.4 (Macintosh/20060516) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 1.47 (SPF_SOFTFAIL) Subject: Parallel fsck in non-preen/full mode? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Jul 2006 01:14:05 -0000 Is there some easy way to force a full (non-preen) and at the same time parallel (i.e. one process per disk) fsck? It could be a real down time saver in crash recovery situations. Imagine the following (fairly typical in my case) scenario: You have many machines with some bunch of drives each and many files on each drive. After a crash (due to a hardware failure or else), the initial preen (fsck -p) fails. You have the following options: a) rely on the background fsck available for 5.x and up; b) set fsck_y_enable to YES to do "fsck -y" if the initial preen fails. c) fsck it manually via local or serial console; Background fsck relies on snapshots, which don't cope well with user quotas and often deadlocks and causes more crashes. Actually the QUOTA + snapshots combination worked somewhat better in 5.x than in 6.x now. For 6.1 it's no longer an option for me. An "fsck -y" is slow as hell as it doesn't run in parallel. For instance 6 72GB drives (each about 75% full with a million of files) could take good 2 hours, primarily because fsck assumes that interaction is required and runs the checks one at a time. Manual fsck needs attention (additional down time), and the fastest way to bring the machine back up is to do exactly the same what a "fsck -p" would to, but in _full_ mode, i.e.: # fsck -y da0s1a # fsck -y da0s1d & # fsck -y da1s1d & ... # fsck -y da7s1d & # ps ax |grep fsck # ... # exit The above takes just 15 minutes or so, plus the time between the moment when the crash actually happens and the moment you start typing on the console (which sometimes could be much more than 15 minutes). This could be automated by putting something similar (plus perhaps some shell code taking device entries from /etc/fstab and a cycle waiting for the fsck processes to finish) in /etc/rc.early or a separate rc.d/ style script. But such a hack I think would look somewhat ugly in shell and would just mimic what fsck already does in order to check multiple drives when running in preen mode. It seems that it would be really helpful (and possibly harmless) if fsck could be forced to do checks in parallel when running with '-y' when console interaction is not needed anyway, or perhaps through a new switch (-Y?). I could try to eventually modify the fsck source and somehow change the default '-y' behavior. But I wouldn't like to carry such additional luggage of custom patches on all servers and also I don't think that I am the most qualified person to do so. So in case someone still reads this, please advice. Regards, Atanas