From owner-freebsd-current@FreeBSD.ORG Mon Nov 1 11:35:30 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 31CB316A4CE for ; Mon, 1 Nov 2004 11:35:30 +0000 (GMT) Received: from vhost109.his.com (vhost109.his.com [216.194.225.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id DBFC343D1D for ; Mon, 1 Nov 2004 11:35:29 +0000 (GMT) (envelope-from brad@stop.mail-abuse.org) Received: from [10.0.1.3] (localhost.his.com [127.0.0.1]) by vhost109.his.com (8.12.11/8.12.3) with ESMTP id iA1BZNwB033797; Mon, 1 Nov 2004 06:35:27 -0500 (EST) (envelope-from brad@stop.mail-abuse.org) Mime-Version: 1.0 X-Sender: bs663385@127.0.0.1 Message-Id: In-Reply-To: <1099286568.4185c82881654@picard.newmillennium.net.au> References: <002401c4bf9c$c4fee8e0$0201000a@riker> <1099286568.4185c82881654@picard.newmillennium.net.au> Date: Mon, 1 Nov 2004 11:48:06 +0100 To: "Alastair D'Silva" From: Brad Knowles Content-Type: text/plain; charset="us-ascii" ; format="flowed" cc: current@freebsd.org Subject: Re: Gvinum RAID5 performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Nov 2004 11:35:30 -0000 At 4:22 PM +1100 2004-11-01, Alastair D'Silva wrote: > The offshoot > of this is that to ensure data integrity, a background process is run > periodically to verify the parity. That's not the way that RAID-5 is supposed to work, at least not the way I understand it. I would be very unhappy if I was using a disk storage subsystem that was configured for RAID-5 and then found out it was working in this manner. At the very least, I don't believe that we could/should do this by default, and adding code to perform in this manner seems to me to be unnecessary complexity. Keep in mind that if you've got a five disk RAID-5 array, then for any given block, four of those disks are data and would have to be accessed on every read operation anyway, and only one disk would be parity. The more disks you have in your RAID array, the lower the parity to data ratio, and the less benefit you would get from checking parity in background. > Alternatively, simply buffering the (whole) stripe in memory may be >enough, as > subsequent reads from the same stripe will be fed from memory, rather than > resulting in another disk I/O (why didn't the on-disk cache feed >this request? Most disks do now have track caches, and they do read and write entire tracks at once. However, given the multititudes of permutations that go on with data addressing (including bad sector mapping, etc...), what the disk thinks of as a "track" may have absolutely no relationship whatsoever to what the OS or driver sees as related or contiguous data. Therefore, the track cache may not contribute in any meaningful way to what the RAID-5 implementation needs in terms of a stripe cache. Moreover, the RAID-5 implementation already knows that it needs to do a full read/write of the entire stripe every time it accesses or writes data to that stripe, and this could easily have destructive interference with the on-disk track cache. Even if you could get away from reading from all disks in the stripe (and delaying the parity calculations to a background process), you're not going to get away from writing to all disks in the stripe, because those parity bits have to be written at the same time as the data and you cannot afford a lazy evaluation here. > I did notice that the read from a single drive resulted in that >drive's access > light being locked on solid, while reading from the plex caused all drives to > flicker rather than being solid). I'd be willing to guess that this is because of the way data is distributed across the disks and the parity calculations that are going on as the data is being accessed. Fundamentally, RAID-5 is not going to be as fast as directly reading the underlying disk. > I think both approaches have the ability to increase overall reliability > as well as improve performance since the drives will not be worked as hard. A "lazy read parity" RAID-5 implementation might have slightly increased performance over normal RAID-5 on the same controller/subsystem configuration, but the added complexity seems to be counter-productive. If there is a hardware vendor that does this, I'd be willing to bet that it's a lame marketing gimmick and little more. -- Brad Knowles, "Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin (1706-1790), reply of the Pennsylvania Assembly to the Governor, November 11, 1755 SAGE member since 1995. See for more info.