From owner-freebsd-current@FreeBSD.ORG Wed Dec 28 16:42:56 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA3A61065673 for ; Wed, 28 Dec 2011 16:42:56 +0000 (UTC) (envelope-from matthias.andree@gmx.de) Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by mx1.freebsd.org (Postfix) with SMTP id 49F208FC17 for ; Wed, 28 Dec 2011 16:42:55 +0000 (UTC) Received: (qmail invoked by alias); 28 Dec 2011 16:42:54 -0000 Received: from f055156006.adsl.alicedsl.de (EHLO mandree.no-ip.org) [78.55.156.6] by mail.gmx.net (mp022) with SMTP; 28 Dec 2011 17:42:54 +0100 X-Authenticated: #428038 X-Provags-ID: V01U2FsdGVkX1+xGHI3xayxuDcMBy9jUb/XNquMepI9pG0kZNjsvg 6WU/Pw4yinBt3Y Received: from [127.0.0.1] (localhost.localdomain [127.0.0.1]) by apollo.emma.line.org (Postfix) with ESMTP id 621FE23CF51 for ; Wed, 28 Dec 2011 17:42:53 +0100 (CET) Message-ID: <4EFB470D.3070309@gmx.de> Date: Wed, 28 Dec 2011 17:42:53 +0100 From: Matthias Andree User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 Mnenhy/0.8.3 Thunderbird/3.1.16 MIME-Version: 1.0 To: freebsd-current@freebsd.org References: <20111227215330.GI45484@redundancy.redundancy.org> In-Reply-To: <20111227215330.GI45484@redundancy.redundancy.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Subject: Re: SU+J systems do not fsck themselves X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Dec 2011 16:42:56 -0000 Am 27.12.2011 22:53, schrieb David Thiel: > I've had multiple machines now (9.0-RC3, amd64, i386 and earlier > 9-CURRENT on ppc) running SU+J that have had unexplained panics and > crashes start happening relating to disk I/O. When I end up running a > full fsck, it keeps turning out that the disk is dirty and corrupted, > but no mechanism is in place with SU+J to detect and fix this. A bgfsck > never happens, but a manual fsck in single-user does indeed fix the > crashing and weird behavior. Others have tested their SU+J volumes and > found them to have errors as well. This makes me super nervous. The one thing I figured is that in the light of power outages, or crashing virtualization hosts, you really really really need to disable disk write caches, and this affects softupdates, journalling, asynch file systems, just about everything. The fact that makes matters worse is that journalling or softupdates allow you to mount a silently-corrupted file system, whereas the traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the foreground, so they get fixed before the FS panics. So can you be sure that: - your driver, chip set and hard disk execute ordered writes in order, - your driver, chip set and hard disk actually write data to permanent storage BEFORE acknowledging a successful write? Whenever I fixed these issues, I had no more corruptions. For ata and sata, there are loader tunables you will want to set, hw.ata.wc=0 and kern.cam.ada.write_cache=0. If your drives are under ada, ad, or ahci related control, try these settings. For SCSI, use camcontrol to turn the write cache off. softupdates is supposed to rectify most of the performance penalties incurred. Note also that you needed to set ahci_load=YES and atapicam_load=YES in 8.X, I've never bothered to check 7.X or 9.X WRT these settings.