Date: Wed, 28 Dec 2011 17:42:53 +0100 From: Matthias Andree <matthias.andree@gmx.de> To: freebsd-current@freebsd.org Subject: Re: SU+J systems do not fsck themselves Message-ID: <4EFB470D.3070309@gmx.de> In-Reply-To: <20111227215330.GI45484@redundancy.redundancy.org> References: <20111227215330.GI45484@redundancy.redundancy.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 27.12.2011 22:53, schrieb David Thiel: > I've had multiple machines now (9.0-RC3, amd64, i386 and earlier > 9-CURRENT on ppc) running SU+J that have had unexplained panics and > crashes start happening relating to disk I/O. When I end up running a > full fsck, it keeps turning out that the disk is dirty and corrupted, > but no mechanism is in place with SU+J to detect and fix this. A bgfsck > never happens, but a manual fsck in single-user does indeed fix the > crashing and weird behavior. Others have tested their SU+J volumes and > found them to have errors as well. This makes me super nervous. The one thing I figured is that in the light of power outages, or crashing virtualization hosts, you really really really need to disable disk write caches, and this affects softupdates, journalling, asynch file systems, just about everything. The fact that makes matters worse is that journalling or softupdates allow you to mount a silently-corrupted file system, whereas the traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the foreground, so they get fixed before the FS panics. So can you be sure that: - your driver, chip set and hard disk execute ordered writes in order, - your driver, chip set and hard disk actually write data to permanent storage BEFORE acknowledging a successful write? Whenever I fixed these issues, I had no more corruptions. For ata and sata, there are loader tunables you will want to set, hw.ata.wc=0 and kern.cam.ada.write_cache=0. If your drives are under ada, ad, or ahci related control, try these settings. For SCSI, use camcontrol to turn the write cache off. softupdates is supposed to rectify most of the performance penalties incurred. Note also that you needed to set ahci_load=YES and atapicam_load=YES in 8.X, I've never bothered to check 7.X or 9.X WRT these settings.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EFB470D.3070309>