From owner-freebsd-hackers@FreeBSD.ORG Wed Feb 15 05:41:11 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4DE99106564A for ; Wed, 15 Feb 2012 05:41:11 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 1B27F8FC0A for ; Wed, 15 Feb 2012 05:41:10 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1F5f7YT001256 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 14 Feb 2012 21:41:10 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3B45CD.7030904@freebsd.org> Date: Tue, 14 Feb 2012 21:42:37 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: Jan Mikkelsen References: <4F3A9266.9050905@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maninya M , freebsd-hackers@freebsd.org Subject: Re: OS support for fault tolerance X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 05:41:11 -0000 On 2/14/12 3:51 PM, Jan Mikkelsen wrote: > > Coming back to the multicore issue: > > The problem when a core fails is that it has affected more than its own state. It will be holding locks on shared resources and may have corrupted shared memory or asked a device to do the wrong thing. By the time you detect a fault in a core, it is too late. Checkpointing to main memory means that you need to be able to roll back to a checkpoint, and replay operations you know about. That involves more that CPU core state, that includes process file and device state. > I think that/s more or less what I was saying but with more concrete examples. and yes I rememebr the tandem boxes from computer shows in Perth and Sydney, but never saw one in the field.