Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2012 21:42:37 -0800
From:      Julian Elischer <julian@freebsd.org>
To:        Jan Mikkelsen <janm-freebsd-hackers@transactionware.com>
Cc:        Maninya M <maninya@gmail.com>, freebsd-hackers@freebsd.org
Subject:   Re: OS support for fault tolerance
Message-ID:  <4F3B45CD.7030904@freebsd.org>
In-Reply-To: <D2890B34-AA3E-4495-8B9F-066153BFD0CF@transactionware.com>
References:  <CAC46K3mc=V=oBOQnvEp9iMTyNXKD1Ki_%2BD0Akm8PM7rdJwDF8g@mail.gmail.com> <4F3A9266.9050905@freebsd.org> <D2890B34-AA3E-4495-8B9F-066153BFD0CF@transactionware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/14/12 3:51 PM, Jan Mikkelsen wrote:
>
> Coming back to the multicore issue:
>
> The problem when a core fails is that it has affected more than its own state. It will be holding locks on shared resources and may have corrupted shared memory or asked a device to do the wrong thing. By the time you detect a fault in a core, it is too late. Checkpointing to main memory means that you need to be able to roll back to a checkpoint, and replay operations you know about. That involves more that CPU core state, that includes process file and device state.
>
I think that/s more or less what I was saying but with more concrete 
examples.
and yes I rememebr the tandem boxes from computer shows in Perth and 
Sydney, but never saw one in the field.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F3B45CD.7030904>