From owner-freebsd-hackers  Wed Mar  4 12:58:33 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA24978
          for freebsd-hackers-outgoing; Wed, 4 Mar 1998 12:58:33 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from sendero.simon-shapiro.org (sendero-fxp0.Simon-Shapiro.ORG [206.190.148.34])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id MAA24939
          for <hackers@FreeBSD.ORG>; Wed, 4 Mar 1998 12:58:24 -0800 (PST)
          (envelope-from shimon@sendero-fxp0.simon-shapiro.org)
Received: (qmail 10343 invoked by uid 1000); 4 Mar 1998 20:58:32 -0000
Message-ID: <XFMail.980304125832.shimon@simon-shapiro.org>
X-Mailer: XFMail 1.3-alpha-021598 [p0] on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <199803041843.TAA01389@yedi.iaf.nl>
Date: Wed, 04 Mar 1998 12:58:32 -0800 (PST)
Reply-To: shimon@simon-shapiro.org
Organization: The Simon Shapiro Foundation
From: Simon Shapiro <shimon@simon-shapiro.org>
To: Wilko Bulte <wilko@yedi.iaf.nl>
Subject: Re: SCSI Bus redundancy...
Cc: julian@whistle.com, hackers@FreeBSD.ORG
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 04-Mar-98 Wilko Bulte wrote:

...

>   Anxiously awaiting. I just missed an opportunity today to obtain a
>   Mylex DAC960 3 channel RAIDcard. Bah.

Last I touched these, they were where DPT was 5 years prior, only buggier.
I was at Intel at the time, working on a ``big'' benchmark and could get
zilch support.  I far a lot better calling, anonymously into DPT hotline,
saying ``I have this 1991 vintage card a friend gave me, and it does...''

Part of a product is its producer and support.  Maybe Mylex is much better
at it today.

 ...

>   A couple of years ago while working at Philips Info Systems we had a
>   SysV2 derivative that could do powerfail/restart (as we called it). 
>   It used some battery backed up RAM, and it was not a PC (M68K cpu).
>   Having never worked on that kernel I don't know how they did it.
>   But it worked pretty well.

The details fail me and we may be talking about two different things:
A device driver monitors the power-fail line (typically, on VME it is an
NMI).  The driver's inerrupt service routine pushes the stack into memory,
sets a bit and halts.

When you boot, you FIRST look at that bit.  If it is ON, you do NOT run
memory test :-), you simply pop the stack and CONT (or whatever).
That driver leaked into the SVR4 source tree.  I used it in another project
on a 486 port, but we did not use a BIOS *Yup, we built a PC that could not
boot DOS, only Unix.

>> Memory SNAP:  If you write it into a DPT controller, and the controller
>> has
>> enough cache to hold it, it is pretty fast.  I can sustain about 2us per
>> transaction overhead and about 120MB/Sec.  This gives us about a second
>> or
>> two.  The new DPT's can retain the cache until power returns.
>> Even a small UPS (with poer alarms will last long enough.
> 
>   But how do you checkpoint things? So, where did the processor leave
>   off?

The DPT gets transactions form the host.  It processes them in an
autonomous manner.  If the entire transaction is OK, an ACK is sent to the
host.  If not, not.  If Power-Fail is detected, the DPT simply halts until
it sees a reset from the host.  Once the reset arrives, it checks the
disks.  If they are all there, it can choose to flush the caches.

One the host, once you detect a power-fail, you write all that you want to
the DPT.  The DPT takes the WRITE requests and ACKs (it acts as a
write-back cache, normal modus operandum).  The only fly in this cup is; 
Whatt if there is more main memory than cache on the DPT (which is normally
the case)?  What we do here, is a callback to an emergency shutdown routine
that calls sync() in the kernel, and then calls boot().  It assumes the UPS
can sustain the system this long, but that is very doable.  1GB worth of
buffers will take (at 6 MB/sec - slow RAID-5) just over two minutes to
flush.  Most systems are much faster than that.

So, the answer is;  There is exactly one checkpoint, and it is a one-shot.
Once we detect power failure, we assume we have reserve power to flush
everything and shutdown.

This does not protect you from disk bay power failures, but these are
almost aloways on N+1 power systems and hooked up to separate UPSs.

To have the kernel actually checkpoint itself, with any better resolution,
or intelligence will have to change too many things.  I am trying to make
the system monitoring drivers implement a general purpose, hardware
independent manner.  How successful that will be I do not know yet.


----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message