From owner-freebsd-hackers  Tue Apr  4 15:57:45 1995
Return-Path: hackers-owner
Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id PAA25412 for hackers-outgoing; Tue, 4 Apr 1995 15:57:45 -0700
Received: from hda.com (hda.com [199.232.40.182]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id PAA25406 for <hackers@FreeBSD.org>; Tue, 4 Apr 1995 15:57:41 -0700
Received: (dufault@localhost) by hda.com (8.6.9/8.3) id SAA01449; Tue, 4 Apr 1995 18:57:12 -0400
From: Peter Dufault <dufault@hda.com>
Message-Id: <199504042257.SAA01449@hda.com>
Subject: Re: large filesystems/multiple disks [RAID]
To: rgrimes@gndrsh.aac.dev.com (Rodney W. Grimes)
Date: Tue, 4 Apr 1995 18:57:12 -0400 (EDT)
Cc: hackers@FreeBSD.org
In-Reply-To: <199504042136.OAA08422@gndrsh.aac.dev.com> from "Rodney W. Grimes" at Apr 4, 95 02:36:04 pm
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 3569      
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

Rodney W. Grimes writes:
> 
> > 
> > Rodney W. Grimes writes:
> > > 
> > > > 
> > > > > RAID does have the negative effect of of having to write 20% more data,
> > > > > thus cutting effective bandwidth by 20%.  It is actually worse than
> > > > > this in that all writes must write to at least 2 drives no matter how
> > > > > small they are.  The removes some of the benifits of stripping.
> > > > 
> > > > And that is why some RAID systems use (battery backed up please ;-) RAM
> > > > caches. This works quite nicely.
> > > 
> > > And you find these caches will fill up and some point in a sustained
> > > write test and you end up right back at the 20% performance loss I
> > > was talking about.
> > > 
> > > Pure stripping of drives always outperforms RAID, you always pay some
> > > price for reliability, and it is usually performance or $$$.
> > 
> > I'm not sure what you mean here.  You don't always need to suffer the
> > performance loss if you're willing to suffer with the data density loss.
> 
> The problem is with RAID to have the reliabilty of any 1 drive going
> bad means you must write data to at least 2 drives for all write opertions.
> 
> This means unless you greatly increase the density of your storage by
> going to mirrors you are going to lose performance.
> 
> > 
> > With a fast channel to the array and dedicated hardware driving the
> > disks and calculating the parity you should be able to get close
> > to N times the throughput while suffering while losing 1/(N+2) of
> > the potential storage, where N is something like 8 and I'm assuming
> > a parity drive and hot standby.
> 
> You'll never get N times the throughput because you always have to
> write to 2 drives to keep the parity data up, thus your bandwidth
> increase is more like (N/2).  I agree that the time loss for parity
> calculations is near zero.  To achive a N factor performance increase
> you must go to N * 10 drives using RAID :-(, a very large cost hit.
> 
> > You're paying again but not in throughput, unless you are comparing this
> > with a 10 way stripe.
> 
> A 5 wide stripe will have better performance (N=5) than a 5 drive raid
> system (N=5/2=2.5).

I've worked with Maximum Strategy's VME disk array and HIPPI disk arrays.

The VME system had this sort of setup:


                       |- buffer board--ESDI channel-ESDI drive
System--VME---Adapter--+- buffer board--ESDI channel-ESDI drive
                       |...
				       |- buffer board--ESDI channel-ESDI drive

The data comes in the VME at about 18MB/s, gets split by the Adapter onto
a dedicated bus to the buffer boards (with parity
constructed on the fly), and then sent out the ESDI controllers to the
drives.  The "sector size" coming in the VME was 8 times the drive
sector size and split across the 8 drives plus parity.  The intention
was to get N times the performance for an N way stripe.

The HIPPI disk array was a four way stripe of these VME disk arrays.
If I remember correctly we got 65MB/s to disk with the HIPPI disk
array.  Max Strat claimed they could do better than that by connecting
two HIPPI arrays back to back for throughput testing.

I assume most disk arrays use a similar approach to address throughput.
The channel coming in has to be fast, and then everything has to stay
out of the way.  (The secret to good I/O: Stay out of the way).

Does this address your throughput concerns?

-- 
Peter Dufault               Real Time Machine Control and Simulation
HD Associates, Inc.         Voice: 508 433 6936
dufault@hda.com             Fax:   508 433 5267