From owner-freebsd-stable@FreeBSD.ORG Sat Apr 2 18:43:33 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABFA21065670 for ; Sat, 2 Apr 2011 18:43:33 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7936F8FC1A for ; Sat, 2 Apr 2011 18:43:33 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.74 (FreeBSD)) (envelope-from ) id 1Q65nI-000Ejp-0t; Sat, 02 Apr 2011 14:43:32 -0400 Date: Sat, 2 Apr 2011 14:43:31 -0400 From: Gary Palmer To: David Magda Message-ID: <20110402184331.GA43505@in-addr.com> References: <87d3l6p5xv.fsf@cosmos.claresco.hr> <874o6ip0ak.fsf@cosmos.claresco.hr> <7b15d37d28f8ddac9eb81e4390231c96.HRCIM@webmail.1command.com> <14c23d4bf5b47a7790cff65e70c66151.HRCIM@webmail.1command.com> <201104020335.p323Zp8Q018666@apollo.backplane.com> <1D1A4498-0CE0-4CE7-8DD3-6066B85C82AF@ee.ryerson.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1D1A4498-0CE0-4CE7-8DD3-6066B85C82AF@ee.ryerson.ca> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-stable@freebsd.org Subject: Re: Constant rebooting after power loss X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Apr 2011 18:43:33 -0000 On Sat, Apr 02, 2011 at 12:55:15PM -0400, David Magda wrote: > On Apr 1, 2011, at 23:35, Matthew Dillon wrote: > > > The solution to this first item is for the OS/filesystem to issue a > > disk flush command to the drive at appropriate times. If I recall the > > ZFS implementation in FreeBSD *DOES* do this for transaction groups, > > which guarantees that a prior transaction group is fully synced before > > a new ones starts running (HAMMER in DragonFly also does this). > > (Just getting an 'ack' from the write transaction over the SATA bus only > > means the data made it to the drive's cache, not that it made it to > > the platter). > > It should also be noted that some drives ignore or lie about these flush commands: i.e., they say they flushed the buffers but did not in fact do so. This is sometimes done on cheap SATA drives, but also on expensive SANS. If the former's case it's often to help with benchmark numbers. In the latter's case, it's usually okay because the buffers are actually NVRAM, and so are safe across power cycles. There are also some USB-to-SATA chipsets that don't handle flush commands and simply ACK them without passing them to the drive, so yanking a drive can cause problems. SANs are *theoretically* safer because of their battery backed caches, however it's not guaranteed - I've seen an array controller crash and royally screw the data sets as a result, even when the cache was allegedly mirrored to the redundant controller in the array. NVRAM/battery backed cache protects against certain failures but introduces other failures in their place. You have to do your own risk/benefit analysis before seeing which is the best solution for your usage scenario. As long as it is "in transit" to permanent storage, it's at risk. All the disk redundancy/battery backed caches in the world is no replacement for a comprehensive *and regularly tested* backup strategy. Regards, Gary