From owner-freebsd-geom@FreeBSD.ORG Fri Nov 2 09:04:29 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD26D16A417 for ; Fri, 2 Nov 2007 09:04:29 +0000 (UTC) (envelope-from joe@rootnode.com) Received: from mail.osoft.us (osoft.us [67.14.192.59]) by mx1.freebsd.org (Postfix) with ESMTP id 9F41F13C4A3 for ; Fri, 2 Nov 2007 09:04:29 +0000 (UTC) (envelope-from joe@rootnode.com) Received: from [10.0.2.105] (adsl-65-67-81-98.dsl.ltrkar.swbell.net [65.67.81.98]) by mail.osoft.us (Postfix) with ESMTP id 3F2D833C8A; Thu, 1 Nov 2007 22:21:01 -0600 (CST) Message-ID: <472AA59F.3020103@rootnode.com> Date: Thu, 01 Nov 2007 23:20:47 -0500 From: Joe Koberg User-Agent: Mail/News 1.5.0.8 (Windows/20061104) MIME-Version: 1.0 To: Ulf Lilleengen References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> In-Reply-To: <20071031215756.GB1670@stud.ntnu.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marco Haddad , freebsd-geom@freebsd.org Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2007 09:04:29 -0000 Ulf Lilleengen wrote: > On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > >> I found in recent researchs that a lot of people say gvinum should not be >> trusted, when it comes to raid5. I began to get worried. Am I alone using >> >> > I'm working on it, and there are definately people still using it. (I've > recieved a number of private mails as well as those seen on this list). IMO, > gvinum can be trusted when it comes to raid5. I've not experienced any > corruption-bugs or anything like that with it. > The source of the mistrust may be the fact that few software-only RAID-5 systems can guarantee write consistency across a multi-drive read-update-write cycle in the case of, e.g., power failure. There is no way for the software RAID to force the parallel writes to complete simultaneously on all drives, and from the time the first starts until the last is completed, the array is in an inconsistent (corrupted) state. Dedicated RAID hardware solves this with battery-backed RAM that maintains the array state in a very robust manner. Dedicated controllers also tend to be connected to "better" SCSI or SAS drives that properly report write completion via their command queuing protocol. ZFS tackles this problem by not writing data back in place, with inline checksums of all data and metadata (so that corruption is detectable), and by dynamically-sized "full stripe writes" for every write (no read-update-write cycle required). A solution for gvinum/UFS may be to set the stripe and filesystem block sizes the same, so that a partial stripe is never written and thus no read-update-write cycle occurs. However the use of in-place updates still has the possibility of corrupting data if the write completes on one drive in the array and not the other. The visibility of this "RAID-5 hole" may be very low if you have a well-behaved system (and drives) on a UPS. But since the corruption is silent, you can be stung far down the road if something "bad" does happen without notice. Especially with ATA drives with less robust writeback cache behavior in small-system environments (without backup power, maybe-flaky cabling, etc...). It is important to note that I am describing a universal problem with software RAID-5, and not any shortcoming of gvinum in particular. Joe Koberg joe at osoft dot us