Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2012 15:01:45 -0800
From:      Julian Elischer <julian@freebsd.org>
To:        Rayson Ho <raysonlogin@gmail.com>
Cc:        Maninya M <maninya@gmail.com>, freebsd-hackers@freebsd.org
Subject:   Re: OS support for fault tolerance
Message-ID:  <4F3AE7D9.8020204@freebsd.org>
In-Reply-To: <CAHwLALOe1Zq86_AdO=D9pEEmOi_kT%2BrORMTXR-xEvhLX0Pt5gw@mail.gmail.com>
References:  <CAC46K3mc=V=oBOQnvEp9iMTyNXKD1Ki_%2BD0Akm8PM7rdJwDF8g@mail.gmail.com>	<4F3A9266.9050905@freebsd.org> <CAHwLALOe1Zq86_AdO=D9pEEmOi_kT%2BrORMTXR-xEvhLX0Pt5gw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/14/12 9:27 AM, Rayson Ho wrote:
> On Tue, Feb 14, 2012 at 11:57 AM, Julian Elischer<julian@freebsd.org>  wrote:
>> but I'm interested in any answers people may have
> The way other OSes handle this is by detecting any abnormal amounts of
> faults (sometimes it's not the fault of the hardware - eg. when a
> partical from the outerspace hits a core and flips the bit), then the
> disable the core(s).
>
> Solaris&  mainframe (z/OS) handle it this way, but you should google
> and find more info since I don't remember all the details.
>
> Also, see this presentation: "Getting to know the Solaris Fault
> Management Architecture (FMA)":
> http://www.prefetch.net/presentations/SolarisFaultManagement_Presentation.pdf
True, but you can't guarantee that a cpu is going to fail in a way 
that you can detect like that.
what if the clock just stops..  I believe that even those systems that 
support cpu deactivation on
error only catch some percentage of the problems, and that sometimes 
it was more of
"bring up the system without cpu X after it all crashed in flames".

tandem and other systems in the old day s used to be able to cope with 
dying cpus pretty well
but they had support from to to bottom and the software was written 
with 'clustering' in mind.





> Rayson
>
> =================================
> Open Grid Scheduler / Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Scalable Grid Engine Support Program
> http://www.scalablelogic.com/
>
>>
>>> _______________________________________________
>>> freebsd-hackers@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>>>
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F3AE7D9.8020204>