Date: Wed, 4 Mar 1998 23:07:08 +0100 (MET) From: Wilko Bulte <wilko@yedi.iaf.nl> To: shimon@simon-shapiro.org Cc: grog@lemis.com, hackers@FreeBSD.ORG, blkirk@float.eli.net, jdn@acp.qiv.com, tlambert@primenet.com, sbabkin@dcn.att.com Subject: Re: SCSI Bus redundancy... Message-ID: <199803042207.XAA04235@yedi.iaf.nl> In-Reply-To: <XFMail.980304123456.shimon@simon-shapiro.org> from Simon Shapiro at "Mar 4, 98 12:34:56 pm"
next in thread | previous in thread | raw e-mail | index | archive | help
As Simon Shapiro wrote...
>
> On 04-Mar-98 Wilko Bulte wrote:
>
> ...
>
> >> Where does that leave kernel RAID? I like controller level RAID
> >> because:
> >>
> >> a. Much more flexible in packaging; I can use of-the shelf disks in
> >> off-the-shelf cases if I choose to).
> >
> > Assuming *good* drives, with *good* firmware. This is as you know not as
> > obvious as it sounds ;-)
>
> Of course not. But moving the logic into the kernel will not solve it. I
Believe me, I like RAID boxes, not kernel raid.
> have always had good success with dedicated controllers. The CMD box, a
> DPT controller typically work very well. The only disapointment (at the
> time) was Mylex. But that probably changed since.
>
> >> b. In the case of a DPT, you get better performance and better
> >> reliability, as I have three busses to spread the I/O across, and
> >> three
> >> busses to take fatal failures on.
> >
> > Yep. Apart from that customer that had a 3 channel Mylex but used only
> > one
> > to attach drives to. Wanted to save on the hot-plug case for the drives.
> > Well, never mind... You can guess what has happened. 3 channel is the
> > bare minimum IMO.
>
> The numbers are simple, and can be easily derived from the SCSI specs
> (Justin can correct me where I am off base here), but a SCSI bus is good
> for 400-600 TPS, a drive is good for about that manym and about 4-6MB/Sec,
> the BUS is not good for much more. If you play with latencies, you arrive
> at 4-6 drives per bus. PCI-memory is good for 120MB/Sec on a sunny day, on
For good performance our rule of thumb is 4-5 disks/bus. Matches your's quite
nicely.
3 buses minimum is more based on the rule that you don't want to have more than
one disk out of each set on a channel. As Murphy has it it is always the bus
with >1 disk that somehow gets killed. A RAID5 of 3 disks is pretty minimal
(OK, it might be 36 Gb netto cap now)
> ...
>
> > ? I don't quite follow you I think. We *still* do RAID to avoid service
> > disruption.
>
> Yes. But service will be disrupted from O/S and application crashes many
> times more. Whe disk packs were manually loaded, etc. a RAID actually
> contributed significantly to uptime. Today we do it to reduce the damage
> WHEN there is a failure, not as much to prevent the failure.
Hm. Well it is a matter of terminology I suppose. In my view 'failure'
also includes a service disruption. But that's a different angle.
> This is where my work in HA comes in. It provides a ``CPU RAID'' at the
> service level. A Traditional FT does it at the instruction level. FreeBSD
> is not a good candidate for that. I also think that instruction level
> redundency is excessive for most applications FreeBSD is fit for. But
> having the service continually available can be a boon to its popularity in
> certain circles.
>
> I think we need to look in this direction as NT is starting to offer some
> such functionality, and we compete with NT. Let Linux compete with
> Win9{5,8}. there is overlap between NT and W95. There is (even
> more) overlap between Linux and FreeBSD, but the ``market'' differentiation
> is there nonetheless.
OK, I understand you intend to compete with NT/Wolfpack (OK, MS Cluster
Server they call it now I think).
What do you call it? 'ChuckPack' ? ;-)
> >> I think the focus changed from operational feature to insurance policy.
>
> > Like going bankrupt or collide in midair in case of an aircraft tracking
> > system.
>
> Yes. These two examples are very good. They are all about recovery time.
> Computers fail. A true FT will detect and correct it at the instruction
> level (almost or exactly). This is crucial for the control surfaces in a
> fly-by-wire airplane. A financial transaction can tolerate a second, or
> seconds lapse in service during an error detection/correction, as long as
> the logical database is still coherent. My HA model guarantees the second
> and does absolutely nothing for the first.
You are aiming for second-failover times? How do you distinguish then between
a somewhat slow machine and one that is really dead?
Wilko
_ ______________________________________________________________________
| / o / / _ Bulte email: wilko @ yedi.iaf.nl http://www.tcja.nl/~wilko
|/|/ / / /( (_) Arnhem, The Netherlands - Do, or do not. There is no 'try'
--------------- Support your local daemons: run [Free,Net,Open]BSD Unix --
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803042207.XAA04235>
