From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 14:06:28 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E800D26F for ; Wed, 14 Nov 2012 14:06:28 +0000 (UTC) (envelope-from gary.buhrmaster@gmail.com) Received: from mail-ia0-f182.google.com (mail-ia0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id A9D588FC17 for ; Wed, 14 Nov 2012 14:06:28 +0000 (UTC) Received: by mail-ia0-f182.google.com with SMTP id x2so385425iad.13 for ; Wed, 14 Nov 2012 06:06:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=upuXioRUj4TzFdnszkdw+jFIyfpB0b/7RwQHmpvsVP8=; b=nq8eA3s6kbhGfDPuLvK52SAA61UiSTJarKoP3HtrSDYqmE4tugO4AT9v6tQ1OqVoqR vsyNHsmtHsddn5BI4v0XBWvAzT1mf+5eSkHjLc7J2vJqyT4UJRJ0a8WtJz040Gld0YWd QpITjzC3/dXCjcVEXFIFXhCFQXNjlcJUxR3YrBm5l1RqhWhzfeu5Rf/7iB51oOU8jz1F PiD29t3PG1ccXYrYhOC3c4MqoV0K0f0h0inlpYE/kvhO/u1Dd1SY74H6XXLPmCfdfNzH LspG0rDQ3Cj9UYfsdluxfOymI0AjENwEjsyQfHIr4VqWpel+zQZfl1jHACYGevunpOi5 naqg== MIME-Version: 1.0 Received: by 10.50.202.97 with SMTP id kh1mr1699468igc.15.1352901982140; Wed, 14 Nov 2012 06:06:22 -0800 (PST) Received: by 10.42.239.3 with HTTP; Wed, 14 Nov 2012 06:06:22 -0800 (PST) In-Reply-To: <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> Date: Wed, 14 Nov 2012 06:06:22 -0800 Message-ID: Subject: Re: SSD recommendations for ZFS cache/log From: Gary Buhrmaster To: Chris BeHanna Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 14:06:29 -0000 On Tue, Nov 13, 2012 at 8:18 PM, Chris BeHanna wrote: .... > If you'll pardon what may be an ignorant question, does this matter if you have your machine on a UPS, especially if you run upsmon or nut to do a graceful shutdown when there are n minutes of battery remaining? In the real world, UPS's aren't (uninterruptable), people pull power cords (even redundant ones), power supplies fail, the power supply redundant backplane fails, and the motherboard fries and shuts down the power supply, and disks/SSDs sometimes corrupt themselves for other random reasons. And, of course, the reason any of this is so important with SSDs is that (almost) all SSDs lie about having written the data to the sectors (they indicate immediate success) since writing to FLASH is so slow (you have to read a flash 4KB/8KB sector, update it with your (usually/often) smaller block, erase the flash sector, and then write the new data). They may also be doing internal scrubs and defragmentation at the time of the request. And so they buffer written data to onboard RAM and report immediate success. Since ZFS is so dependent on the ZIL being correct for recovery (smart people have added codes to no longer result in complete loss when it encounters a corrupted ZIL, but the result can still be some data loss), the ZFS codes to update the ZIL expect that when the device indicates "written to disk complete", it has been written. Since the flash has buffered the ZIL data, a power failure could result in violating this presumption of ZFS and the ZIL integrity. A common solution on SSDs is sometimes called a "super capacitor" so that in the event of a power failure the SSD still has enough power (time) to finish in-flight writes. Marketing in various companies call the solution different things. Gary