From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 14 17:49:41 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4EB310656D8 for ; Tue, 14 Feb 2012 17:49:41 +0000 (UTC) (envelope-from raysonlogin@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 936F08FC1B for ; Tue, 14 Feb 2012 17:49:41 +0000 (UTC) Received: by daec6 with SMTP id c6so199401dae.13 for ; Tue, 14 Feb 2012 09:49:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=4GnzSYcz3k53jWfWCD1G9sA4QnHnGIXk2cqcxBsykjE=; b=gJG4G4h3AeeJHCJPrQf19ThkIxM9stYA7HUuI+wL27AGiZICwMLz3COvoG+SzpYtVS 4jlHZvoJiDafrq3t6XL/93G6SgTNArJROYCSK5Sw5XXaset7tzoTT1Xq/ZPJY3LTLFKa t2AGzHaOtI4zxbxcvXUFAAo7A/NDajFRFnoDo= MIME-Version: 1.0 Received: by 10.68.239.229 with SMTP id vv5mr60655942pbc.88.1329241781180; Tue, 14 Feb 2012 09:49:41 -0800 (PST) Received: by 10.142.245.14 with HTTP; Tue, 14 Feb 2012 09:49:41 -0800 (PST) Date: Tue, 14 Feb 2012 12:49:41 -0500 Message-ID: From: Rayson Ho To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: OS support for fault tolerance (re-send) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 17:49:41 -0000 (The email below did not show up on the online archive - resending...) ---------- Forwarded message ---------- From: Rayson Ho Date: Tue, Feb 14, 2012 at 12:27 PM Subject: Re: OS support for fault tolerance On Tue, Feb 14, 2012 at 11:57 AM, Julian Elischer wrote: > but I'm interested in any answers people may have The way other OSes handle this is by detecting any abnormal amounts of faults (sometimes it's not the fault of the hardware - eg. when a partical from the outerspace hits a core and flips the bit), then the disable the core(s). Solaris & mainframe (z/OS) handle it this way, but you should google and find more info since I don't remember all the details. Also, see this presentation: "Getting to know the Solaris Fault Management Architecture (FMA)": http://www.prefetch.net/presentations/SolarisFaultManagement_Presentation.pdf Rayson ================================= Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > > >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"