Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Jul 1998 12:29:52 +0930
From:      Greg Lehey <grog@lemis.com>
To:        Terry Lambert <tlambert@primenet.com>, "Justin T. Gibbs" <gibbs@plutotech.com>
Cc:        andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG
Subject:   Re: Software RAID-5 performance
Message-ID:  <19980714122952.L754@freebie.lemis.com>
In-Reply-To: <199807132331.QAA15175@usr08.primenet.com>; from Terry Lambert on Mon, Jul 13, 1998 at 11:31:08PM %2B0000
References:  <199807132219.QAA07976@pluto.plutotech.com> <199807132331.QAA15175@usr08.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
(trimming -fs)

On Monday, 13 July 1998 at 23:31:08 +0000, Terry Lambert wrote:
>>>> There is supposedly some work being done in the NetBSD environment to
>>>> pull RaidFrame into their kernel.  You may want to get involved with
>>>> that effort.
>>>
>>> Terry did that already for FreeBSD: http://www.freebsd.org/~terry
>>
>> Last I heard, Terry had ported the userland implementation to FreeBSD,
>> not the kernel one.
>
> This is correct.
>
> I did the userspace port, and I sent the patches back to the maintainer;
> in theory it should work "out of the box".
>
>> The kernel stuff would actually give reasonable
>> performance since the userland code doesn't have "real threads" to rely
>> on.
>
> The biggest overhead in a software RAID 5 is the software instead
> of hardware checksum calculation, and that's going to be a much
> higher penalty than non-interleaved I/O (IMO; writes are typically
> interleaved, and where you care about reads, you will be set to
> trigger read-ahead).

Don't overestimate the penalty of checksum calculations.  Sure,
they're an issue, but recall that it takes maybe one instruction per
byte written to perform the checksum.  Before you write, you must
read, so you're not going to checksum more than, say, 5 MB/s per
controller.  That doesn't take up much CPU on a modern processor.

I'm currently running some tests on vinum running on a degraded array.
The CPU is a 486/66, the controller a 2940UW, and the disks are CDC
94181s, admittedly pretty slow.  The CPU is running at 97% idle,
interrupt time (where the checksums are done) is running at 1.9%.
With a set of fast disks, this might go up to 20% to 25%, but that's
still not bad for a 486/66.

Non-interleaved I/O, on the other hand, can be a big penalty (if we're
talking about the same thing).  If I have an array with 5 drives, each
capable of a realistic 5 MB/s, and a stripe width of 64 kB, and I
write 256 kB to it, I need to do:

1.  *Don't* read in the old blocks.  If you're completely replacing
    the stripe, you have all the information you need to calculate the
    parity block.

2.  Calculate parity.  On the 486/66, this looks like being about 8
    ms.

3.  Write the blocks.  If you can do this in parallel, it'll take
    about 13 ms.  Serially, it'll take about 50 ms.

Of course, this comparison isn't the real issue.  If I can transfer at
20 MB/s to the RAID controller, the penalty disappears.  But the
potential is there, and I'm comparing worst case: fast disks, fast
controller, slow processor.

> Software RAID is a data integrity issue, not a performance one,
> and I think making the performance argument for whatever reason
> (protection domain crossing, interleaved I/O, SMP scalability,
> etc.) is a strawman at best.

I'm not sure that I understand what you're saying here.  Obviously
offloading the checksum calculation (or anything else, for that
matter) to an external box will offload the CPU.  And I can't see any
particular difference in data integrity between the two approaches.

Greg
--
See complete headers for address and phone numbers
finger grog@lemis.com for PGP public key

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980714122952.L754>