From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 14 17:55:47 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 37731106566C; Tue, 14 Feb 2012 17:55:47 +0000 (UTC) (envelope-from raysonlogin@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0BD538FC1B; Tue, 14 Feb 2012 17:55:46 +0000 (UTC) Received: by pbcxa7 with SMTP id xa7so747396pbc.13 for ; Tue, 14 Feb 2012 09:55:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TNoifQ4IpZOIj8cB+M83EH4Pm51yc4q70V7ceuI/P14=; b=FdEZ79l+HJztWD1Ah66H0omZiW5KO+YUo8HXMB4xXswzL8ixjzn8IGh+Qa23gBiFTy H8b5t/0N2FEOLZUM26OSEiGIHeJv6J60M/iJnaA171qwOr3S3WgEgtQpcWmDl51QBVXy y7/I5TrnYJSOkH43/tJSP7mo6l5LtTgF3DbI4= MIME-Version: 1.0 Received: by 10.68.132.166 with SMTP id ov6mr59924887pbb.122.1329240435872; Tue, 14 Feb 2012 09:27:15 -0800 (PST) Received: by 10.142.245.14 with HTTP; Tue, 14 Feb 2012 09:27:15 -0800 (PST) In-Reply-To: <4F3A9266.9050905@freebsd.org> References: <4F3A9266.9050905@freebsd.org> Date: Tue, 14 Feb 2012 12:27:15 -0500 Message-ID: From: Rayson Ho To: Julian Elischer Content-Type: text/plain; charset=ISO-8859-1 Cc: Maninya M , freebsd-hackers@freebsd.org Subject: Re: OS support for fault tolerance X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 17:55:47 -0000 On Tue, Feb 14, 2012 at 11:57 AM, Julian Elischer wrote: > but I'm interested in any answers people may have The way other OSes handle this is by detecting any abnormal amounts of faults (sometimes it's not the fault of the hardware - eg. when a partical from the outerspace hits a core and flips the bit), then the disable the core(s). Solaris & mainframe (z/OS) handle it this way, but you should google and find more info since I don't remember all the details. Also, see this presentation: "Getting to know the Solaris Fault Management Architecture (FMA)": http://www.prefetch.net/presentations/SolarisFaultManagement_Presentation.pdf Rayson ================================= Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > > >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" -- Rayson ================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/