From owner-freebsd-current@FreeBSD.ORG  Wed Dec 28 16:42:56 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA3A61065673
	for <freebsd-current@freebsd.org>; Wed, 28 Dec 2011 16:42:56 +0000 (UTC)
	(envelope-from matthias.andree@gmx.de)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22])
	by mx1.freebsd.org (Postfix) with SMTP id 49F208FC17
	for <freebsd-current@freebsd.org>; Wed, 28 Dec 2011 16:42:55 +0000 (UTC)
Received: (qmail invoked by alias); 28 Dec 2011 16:42:54 -0000
Received: from f055156006.adsl.alicedsl.de (EHLO mandree.no-ip.org)
	[78.55.156.6]
	by mail.gmx.net (mp022) with SMTP; 28 Dec 2011 17:42:54 +0100
X-Authenticated: #428038
X-Provags-ID: V01U2FsdGVkX1+xGHI3xayxuDcMBy9jUb/XNquMepI9pG0kZNjsvg
	6WU/Pw4yinBt3Y
Received: from [127.0.0.1] (localhost.localdomain [127.0.0.1])
	by apollo.emma.line.org (Postfix) with ESMTP id 621FE23CF51
	for <freebsd-current@freebsd.org>; Wed, 28 Dec 2011 17:42:53 +0100 (CET)
Message-ID: <4EFB470D.3070309@gmx.de>
Date: Wed, 28 Dec 2011 17:42:53 +0100
From: Matthias Andree <matthias.andree@gmx.de>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.24) Gecko/20111109 Mnenhy/0.8.3 Thunderbird/3.1.16
MIME-Version: 1.0
To: freebsd-current@freebsd.org
References: <20111227215330.GI45484@redundancy.redundancy.org>
In-Reply-To: <20111227215330.GI45484@redundancy.redundancy.org>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Subject: Re: SU+J systems do not fsck themselves
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Dec 2011 16:42:56 -0000

Am 27.12.2011 22:53, schrieb David Thiel:
> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 
> 9-CURRENT on ppc) running SU+J that have had unexplained panics and 
> crashes start happening relating to disk I/O. When I end up running a 
> full fsck, it keeps turning out that the disk is dirty and corrupted, 
> but no mechanism is in place with SU+J to detect and fix this. A bgfsck 
> never happens, but a manual fsck in single-user does indeed fix the 
> crashing and weird behavior. Others have tested their SU+J volumes and 
> found them to have errors as well. This makes me super nervous.

The one thing I figured is that in the light of power outages, or
crashing virtualization hosts, you really really really need to disable
disk write caches, and this affects softupdates, journalling, asynch
file systems, just about everything.

The fact that makes matters worse is that journalling or softupdates
allow you to mount a silently-corrupted file system, whereas the
traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the
foreground, so they get fixed before the FS panics.

So can you be sure that:

- your driver, chip set and hard disk execute ordered writes in order,

- your driver, chip set and hard disk actually write data to permanent
storage BEFORE acknowledging a successful write?

Whenever I fixed these issues, I had no more corruptions.

For ata and sata, there are loader tunables you will want to set,
hw.ata.wc=0 and kern.cam.ada.write_cache=0.

If your drives are under ada, ad, or ahci related control, try these
settings.  For SCSI, use camcontrol to turn the write cache off.
softupdates is supposed to rectify most of the performance penalties
incurred.

Note also that you needed to set ahci_load=YES and atapicam_load=YES in
8.X, I've never bothered to check 7.X or 9.X WRT these settings.