From owner-freebsd-hackers Mon Jul 13 20:01:44 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA05888 for freebsd-hackers-outgoing; Mon, 13 Jul 1998 20:01:44 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from freebie.lemis.com (freebie.lemis.com [139.130.136.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA05878 for ; Mon, 13 Jul 1998 20:01:36 -0700 (PDT) (envelope-from grog@freebie.lemis.com) Received: (from grog@localhost) by freebie.lemis.com (8.9.0/8.9.0) id MAA13966; Tue, 14 Jul 1998 12:29:52 +0930 (CST) Message-ID: <19980714122952.L754@freebie.lemis.com> Date: Tue, 14 Jul 1998 12:29:52 +0930 From: Greg Lehey To: Terry Lambert , "Justin T. Gibbs" Cc: andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG Subject: Re: Software RAID-5 performance References: <199807132219.QAA07976@pluto.plutotech.com> <199807132331.QAA15175@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.91.1i In-Reply-To: <199807132331.QAA15175@usr08.primenet.com>; from Terry Lambert on Mon, Jul 13, 1998 at 11:31:08PM +0000 WWW-Home-Page: http://www.lemis.com/~grog Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG (trimming -fs) On Monday, 13 July 1998 at 23:31:08 +0000, Terry Lambert wrote: >>>> There is supposedly some work being done in the NetBSD environment to >>>> pull RaidFrame into their kernel. You may want to get involved with >>>> that effort. >>> >>> Terry did that already for FreeBSD: http://www.freebsd.org/~terry >> >> Last I heard, Terry had ported the userland implementation to FreeBSD, >> not the kernel one. > > This is correct. > > I did the userspace port, and I sent the patches back to the maintainer; > in theory it should work "out of the box". > >> The kernel stuff would actually give reasonable >> performance since the userland code doesn't have "real threads" to rely >> on. > > The biggest overhead in a software RAID 5 is the software instead > of hardware checksum calculation, and that's going to be a much > higher penalty than non-interleaved I/O (IMO; writes are typically > interleaved, and where you care about reads, you will be set to > trigger read-ahead). Don't overestimate the penalty of checksum calculations. Sure, they're an issue, but recall that it takes maybe one instruction per byte written to perform the checksum. Before you write, you must read, so you're not going to checksum more than, say, 5 MB/s per controller. That doesn't take up much CPU on a modern processor. I'm currently running some tests on vinum running on a degraded array. The CPU is a 486/66, the controller a 2940UW, and the disks are CDC 94181s, admittedly pretty slow. The CPU is running at 97% idle, interrupt time (where the checksums are done) is running at 1.9%. With a set of fast disks, this might go up to 20% to 25%, but that's still not bad for a 486/66. Non-interleaved I/O, on the other hand, can be a big penalty (if we're talking about the same thing). If I have an array with 5 drives, each capable of a realistic 5 MB/s, and a stripe width of 64 kB, and I write 256 kB to it, I need to do: 1. *Don't* read in the old blocks. If you're completely replacing the stripe, you have all the information you need to calculate the parity block. 2. Calculate parity. On the 486/66, this looks like being about 8 ms. 3. Write the blocks. If you can do this in parallel, it'll take about 13 ms. Serially, it'll take about 50 ms. Of course, this comparison isn't the real issue. If I can transfer at 20 MB/s to the RAID controller, the penalty disappears. But the potential is there, and I'm comparing worst case: fast disks, fast controller, slow processor. > Software RAID is a data integrity issue, not a performance one, > and I think making the performance argument for whatever reason > (protection domain crossing, interleaved I/O, SMP scalability, > etc.) is a strawman at best. I'm not sure that I understand what you're saying here. Obviously offloading the checksum calculation (or anything else, for that matter) to an external box will offload the CPU. And I can't see any particular difference in data integrity between the two approaches. Greg -- See complete headers for address and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message