From owner-freebsd-current@FreeBSD.ORG Sun Dec 16 17:18:12 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AE8B16A41B for ; Sun, 16 Dec 2007 17:18:12 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 8C33C13C442 for ; Sun, 16 Dec 2007 17:18:11 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id lBGH7rCx051920; Sun, 16 Dec 2007 10:07:53 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <47655B4B.6010902@samsco.org> Date: Sun, 16 Dec 2007 10:07:23 -0700 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4 MIME-Version: 1.0 To: darrenr@freebsd.org References: <06CAC7FC-DB58-441D-A6E0-76D1D8133393@tamu.edu> <86ir31xwlu.fsf@ds4.des.no> <476343B4.8080208@FreeBSD.org> <86tzmk54tt.fsf@ds4.des.no> <476419CD.9070401@terranova.net> <20071216024259.GI48684@cicely12.cicely.de> <4764F282.7030706@freebsd.org> In-Reply-To: <4764F282.7030706@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Sun, 16 Dec 2007 10:07:54 -0700 (MST) X-Spam-Status: No, score=-1.4 required=5.4 tests=ALL_TRUSTED autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: freebsd-current@freebsd.org, ticso@cicely.de, Ivan Voras Subject: Re: ZFS melting under postgres... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 17:18:12 -0000 Darren Reed wrote: > Bernd Walter wrote: > ... >> One problem is with the data blocks beeing that big, when writing >> 512 Byte you effectifly do a read-modify-write of a larger physical >> block. >> This can be handled quite well with larger FS block. >> The much bigger problem is with power loss when writing such a >> maintenence block. >> You loose a very large area of logical blocks when this fails, >> since a 4k maintenence block contains the allocation for several hundert >> kB of logical data blocks. >> In other words - you possibly loose data blocks that were not written >> a long time and the database wouldn't expect a problem with that data. >> Even for ZIL it is very questionable if you loose a large data area, >> since the purpose is to have the data that was already sinced readable >> after a power loss. > ... > > ZFS doesn't suffer from this problem because the design > is to always write a new section of data rather than > over write "current" data. > > So if you lose power in the middle of a write to a data > block, there is no damage to the old data. ... except with disks that write sectors via read-update-write on whole tracks at a time (i.e. all SATA/ATA disks and probably more and more SAS/SCSI disks as well these days). The speed and density optimizations that have been introduced to disks in the past 10 years don't come for free; they directly impact reliability. That's why you don't ever, ever want to loose power to a disk subsystem that you consider critical. Scott