Date: Wed, 25 Feb 1998 13:21:46 -0600 From: Karl Denninger <karl@mcs.net> To: Wilko Bulte <wilko@yedi.iaf.nl> Cc: Jay Nelson <jdn@acp.qiv.com>, blkirk@float.eli.net, hackers@FreeBSD.ORG Subject: Re: SCSI Bus redundancy... Message-ID: <19980225132146.02016@mcs.net> In-Reply-To: <199802251848.TAA01481@yedi.iaf.nl>; from Wilko Bulte on Wed, Feb 25, 1998 at 07:48:31PM %2B0100 References: <Pine.BSF.3.96.980224194109.1380A-100000@acp.qiv.com> <199802251848.TAA01481@yedi.iaf.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a tricky problem to solve "correctly". I have seen several potential solutions, and all have problems. I've actually INSTALLED AND USED a couple of them; they cover what they are designed to cover quite well, but aren't perfect. Let's say you have two machines, one in "hot standby" mode, the other active. They monitor each other over a private interconnect. Both are "on" the disk bus (perhaps through an active/active RAID controller), but only one is using it. If the first fails, the second activates itself, fsck's the disks, mounts them, changes its Ethernet MAC address to that of the failed machine and comes online. If the first failed due to a software problem and went down "gracefully", unmounting the disks, the restart time is measured in seconds. If it blew chunks then FSCK has to run - and you damn well better be using a journaled filesystem or this is going to take a LONG time (ie: 20 minutes to an hour if you have some large disk storage involved here). This is one reason, by the way, that LFS being in a "working" state is important to these kinds of efforts. IBM has a solution that they've sold for quite some time based on AIX (which inherently uses jfs, a journalled filesystem) which does exactly this. So far, so good. Now, where are the problems: 1) What if the second machine THINKS the first is dead, but its wrong! This could be extremely bad. Its one of the failure scenarios that the cluster people don't like to talk about, because the consequence of being "wrong" about this could be the destruction of the disk packs involved. There ARE some solutions to this if you use a raw interface to the disks and each "checkpoints" to a specific sector on a regular basis. You SHOULD be able to detect, reliably, whether the other machine is working this way. But its a non-trivial problem to solve, and the risk of being wrong is that you trash the entire working storage set on the disk subsystem. 2) Concurrent *filesystem* access under Unix is a real bitch. I've yet to see a *good* solution to this problem. I've seen lots of hacks, but no real solutions. I consider concurrent RAW disk slice access to be next to worthless, but I understand that some DBMS companies find that "solution" ideal for their particular application. What I've thought about for a long time is architecting an active/active solution to this problem. Its tricky as hell to do right, but you'd basically have a bulletproof final installation in which you could take a hammer to any *ONE* device of a redundant set in the final configuration and the noticable impact from the outside would be *zero*. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly to FULL DS-3 Service | NEW! K56Flex support on ALL modems Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost On Wed, Feb 25, 1998 at 07:48:31PM +0100, Wilko Bulte wrote: > As Jay Nelson wrote... > > On Tue, 24 Feb 1998, Ben Kirkpatrick, ELI wrote: > > > > > I've been wondering about the scsi redundancy problems that come up now > > >and then (read: I've been chewing on paint chips again). What parts are > > >failing? In my experience, only disks have failed once installed; > > >controllers have only failed during poor installations and very rare at > > >that. > > > But what I was really wondering, is this about have two SCSI cards on > > >one scsi bus. On one of my old adaptec's it _looks_ like I can change the > > >controller from ID7 to anything else. With a controller at say 6 and 7, > > >would there be a way in software for both controllers to access the disks? > > >Or even for the standby controller to just scan the bus now and then? > > > Okey, I'm going off the deep-end, back to my white-out (old-formula). > > > > > >--Ben Kirkpatrick > > > This is normally done with differential controllers between two > > different machines -- and, yes, it works. I don't think it's possible > > See Digital Unix TruClusters, they indeed only want differential for > the shared SCSI buses. > > > with single ended controllers. Concurrent file access from two > > different machines is a _lot_ more troublesome because of the locking > > problems. I don't know of any standard Unices that support this out of > > the box. It usually takes two special daemons that run on both > > machines willing to communicate with each other. > > Digital Unix TruClusters do DRD (distributed raw device) now. Things > like Oracle Parallel Server love this. A cluster filesystem is another > kettle of fish of course. But not impossible, see OpenVMS. > > > If you want both controllers on the same machine for high > > availability, you'll need to write some software to monitor status and > > take the appropriate actions if there is a failure. Otherwise, I don't > > See www.veritas.com for a number of whitepapers on High Availabilty. > Veritas calls their product FirstWatch. > > Wilko > _ ______________________________________________________________________ > | / o / / _ Bulte email: wilko @ yedi.iaf.nl http://www.tcja.nl/~wilko > |/|/ / / /( (_) Arnhem, The Netherlands - Do, or do not. There is no 'try' > --------------- Support your local daemons: run [Free,Net,Open]BSD Unix -- > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980225132146.02016>