From owner-freebsd-questions  Tue May 25  4:56:39 1999
Delivered-To: freebsd-questions@freebsd.org
Received: from www.inx.de (www.inx.de [195.21.255.251])
	by hub.freebsd.org (Postfix) with ESMTP id 4F51514D8E
	for <freebsd-questions@freebsd.org>; Tue, 25 May 1999 04:56:35 -0700 (PDT)
	(envelope-from jnickelsen@acm.org)
Received: from n33-71.berlin.snafu.de ([195.21.33.71] helo=goting.jn.berlin.snafu.de)
	by www.inx.de with esmtp (Exim 2.12 #2)
	id 10mFow-00015r-00; Tue, 25 May 1999 13:56:35 +0200
Received: from ockholm.jn.berlin.snafu.de (ockholm.jn.berlin.snafu.de [10.0.0.3])
	by goting.jn.berlin.snafu.de (Postfix) with ESMTP
	id BA26613D; Tue, 25 May 1999 12:29:20 +0200 (CEST)
Date: Tue, 25 May 1999 12:29:29 +0200
From: Juergen Nickelsen <jnickelsen@acm.org>
To: Alex Heiphetz <heiphetz@cvzoom.net>
Cc: freebsd-questions@FreeBSD.ORG
Subject: Re: 100% dependability/failsafe/security/hardware
Message-ID: <388916.3136624169@ockholm.jn.berlin.snafu.de>
In-Reply-To: <3.0.6.32.19990524185242.009583e0@cvzoom.net>
Originator-Info: login-id=nickel; server=goting.jn.berlin.snafu.de
X-Mailer: Mulberry (MacOS) [1.4.2.1, s/n U-301240]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

--On Mon, 24. Mai 1999 18:52 -0400 Alex Heiphetz <heiphetz@cvzoom.net>
wrote:

> 3. How to provide 100% failsafe system?

*All* hardware redundant: CPUs, RAM, secondary storage, data paths,
power supplies, fans, UPSs, etc.; proactive hardware monitoring
facilities (including CPU results, see below) with hot failover and
automatic notification of field service in case of problems; the
ability to replace all parts (except the case, perhaps, but probably
including bus backplanes) while the system is running; and an operating
system that manages all that.

There are a handful vendors making such machines with 100% guaranteed
reliability, and these machines do *not* come cheap. I once saw a
Stratus Continuum system at the german weather service (DWD), which was
(and still is) the regional telecommunications hub in central Europe
for the Global Telecommunications System of the World Meteorological
Organization. You could pull out a CPU module and plug it back in while
the machine was running, but the guy who demonstrated dared to pull out
only a fan module. Still, the machine noticed and the modem dialled to
notify tech support.

Once, the guy told us, there was a hard disk in the mail, and nobody
knew why, because it hadn't been ordered (and the DWD is a *big*
organization). Finally they found out that the machine itself had
ordered the disk, because it had noticed an increase of soft errors on
one of its disks.

The CPUs of this machine are packs of four (two pairs). One of the
pairs is active, the other in hot standby, all four running
synchronuosly the same instructions. All signals of the two CPUs of a
pair are compared by hardware comparators, and if there is a
difference, the pair is taken out of service; if it was the active one,
the other takes over, without, literally, missing a beat.


Well, this is what you need for 100% reliability. Be prepared to pay a
lot for it. And, sorry to say, but you won't run FreeBSD on such a
system.

Greetings, Juergen.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message