From owner-freebsd-stable@FreeBSD.ORG  Sat Apr  2 18:43:33 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ABFA21065670
	for <freebsd-stable@freebsd.org>; Sat,  2 Apr 2011 18:43:33 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 7936F8FC1A
	for <freebsd-stable@freebsd.org>; Sat,  2 Apr 2011 18:43:33 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.74 (FreeBSD))
	(envelope-from <gpalmer@freebsd.org>)
	id 1Q65nI-000Ejp-0t; Sat, 02 Apr 2011 14:43:32 -0400
Date: Sat, 2 Apr 2011 14:43:31 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: David Magda <dmagda@ee.ryerson.ca>
Message-ID: <20110402184331.GA43505@in-addr.com>
References: <87d3l6p5xv.fsf@cosmos.claresco.hr>
	<AANLkTi=kEyz-mKLzdV8LAf91ZhMTP8gLKs=3Eu5WD8mh@mail.gmail.com>
	<874o6ip0ak.fsf@cosmos.claresco.hr>
	<7b15d37d28f8ddac9eb81e4390231c96.HRCIM@webmail.1command.com>
	<AANLkTi=KEwmm1hM6Z=r_SWUAn9KhUrkTVzfF6VmqQauW@mail.gmail.com>
	<14c23d4bf5b47a7790cff65e70c66151.HRCIM@webmail.1command.com>
	<AANLkTi=6pqRwJ96Lg=603cYg_f8QUXkg8aXtbjbYpFrV@mail.gmail.com>
	<201104020335.p323Zp8Q018666@apollo.backplane.com>
	<1D1A4498-0CE0-4CE7-8DD3-6066B85C82AF@ee.ryerson.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1D1A4498-0CE0-4CE7-8DD3-6066B85C82AF@ee.ryerson.ca>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false
Cc: freebsd-stable@freebsd.org
Subject: Re: Constant rebooting after power loss
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Apr 2011 18:43:33 -0000

On Sat, Apr 02, 2011 at 12:55:15PM -0400, David Magda wrote:
> On Apr 1, 2011, at 23:35, Matthew Dillon wrote:
> 
> >    The solution to this first item is for the OS/filesystem to issue a
> >    disk flush command to the drive at appropriate times.  If I recall the
> >    ZFS implementation in FreeBSD *DOES* do this for transaction groups,
> >    which guarantees that a prior transaction group is fully synced before
> >    a new ones starts running (HAMMER in DragonFly also does this).
> >    (Just getting an 'ack' from the write transaction over the SATA bus only
> >    means the data made it to the drive's cache, not that it made it to
> >    the platter).
> 
> It should also be noted that some drives ignore or lie about these flush commands: i.e., they say they flushed the buffers but did not in fact do so. This is sometimes done on cheap SATA drives, but also on expensive SANS. If the former's case it's often to help with benchmark numbers. In the latter's case, it's usually okay because the buffers are actually NVRAM, and so are safe across power cycles. There are also some USB-to-SATA chipsets that don't handle flush commands and simply ACK them without passing them to the drive, so yanking a drive can cause problems.

SANs are *theoretically* safer because of their battery backed caches, however
it's not guaranteed - I've seen an array controller crash and royally screw
the data sets as a result, even when the cache was allegedly mirrored to
the redundant controller in the array.

NVRAM/battery backed cache protects against certain failures but introduces
other failures in their place.  You have to do your own risk/benefit
analysis before seeing which is the best solution for your usage scenario.
As long as it is "in transit" to permanent storage, it's at risk.  All the
disk redundancy/battery backed caches in the world is no replacement for
a comprehensive *and regularly tested* backup strategy.

Regards,

Gary