From owner-freebsd-current@FreeBSD.ORG  Tue Dec 27 22:20:13 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B0A46106564A
	for <freebsd-current@freebsd.org>; Tue, 27 Dec 2011 22:20:13 +0000 (UTC)
	(envelope-from lx@redundancy.redundancy.org)
Received: from redundancy.redundancy.org (75-101-96-57.dsl.static.sonic.net
	[75.101.96.57]) by mx1.freebsd.org (Postfix) with SMTP id 758DF8FC16
	for <freebsd-current@freebsd.org>; Tue, 27 Dec 2011 22:20:13 +0000 (UTC)
Received: (qmail 73876 invoked by uid 1001); 27 Dec 2011 21:53:55 -0000
Date: Tue, 27 Dec 2011 13:53:55 -0800
From: David Thiel <lx@redundancy.redundancy.org>
To: freebsd-current@freebsd.org
Message-ID: <20111227215330.GI45484@redundancy.redundancy.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-OpenPGP-Key-fingerprint: 482A 8C46 C844 7E7C 8CBC 2313 96EE BEE5 1F4B CA13
X-OpenPGP-Key-available: http://redundancy.redundancy.org/lx.gpg
X-Face: %H~{$1~NOw1y#%mM6{|4:/<p]y9X%E+4%:1wo-M!re,
	zl.qH~yzbL-MWhtp$3QuKP&di/a{FOctD[FuX.\n4U*,
	M{TJg$oYp663.NyX!%H~~Tw$eR9xZU5W?1BM#t"a@'27^2x
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: SU+J systems do not fsck themselves
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Dec 2011 22:20:13 -0000

I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 
9-CURRENT on ppc) running SU+J that have had unexplained panics and 
crashes start happening relating to disk I/O. When I end up running a 
full fsck, it keeps turning out that the disk is dirty and corrupted, 
but no mechanism is in place with SU+J to detect and fix this. A bgfsck 
never happens, but a manual fsck in single-user does indeed fix the 
crashing and weird behavior. Others have tested their SU+J volumes and 
found them to have errors as well. This makes me super nervous.

Basically, the way SU+J seems to operate is this:

http://redundancy.redundancy.org/fscklog2

"Oh hey, I see you shut down uncleanly, let's check everything looks 
good, off you go, whee"

Until I actually go and fsck, when I get:

http://redundancy.redundancy.org/fscklog1

So, I understand that journalling doesn't replace the need for a 
potential fsck (though I never had this problem with gjournal), but 
without a way for the system to detect that a fsck is necessary, this 
seems pretty much a guaranteed recipe for data corruption, and seems to 
offer little to no benefit over plain SU+fsck, or even just mounting 
async.

So: is everyone else seeing this? Am I misunderstanding how SU+J should 
be used? How should the error resolution process really happen? 

Thanks,
David