From owner-freebsd-current@FreeBSD.ORG Fri Jan 2 12:14:45 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 62A9D16A4CE for ; Fri, 2 Jan 2004 12:14:45 -0800 (PST) Received: from alpha.siliconlandmark.com (alpha.siliconlandmark.com [209.69.98.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id A188743D2F for ; Fri, 2 Jan 2004 12:14:43 -0800 (PST) (envelope-from andy@siliconlandmark.com) Received: from alpha.siliconlandmark.com (localhost [127.0.0.1]) i02KE8ZV012148; Fri, 2 Jan 2004 15:14:08 -0500 (EST) (envelope-from andy@siliconlandmark.com) Received: from localhost (andy@localhost)i02KE83u012145; Fri, 2 Jan 2004 15:14:08 -0500 (EST) (envelope-from andy@siliconlandmark.com) X-Authentication-Warning: alpha.siliconlandmark.com: andy owned process doing -bs Date: Fri, 2 Jan 2004 15:14:08 -0500 (EST) From: Andre Guibert de Bruet To: Jeff Jirsa In-Reply-To: Message-ID: <20040102143551.H9356@alpha.siliconlandmark.com> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean cc: Oliver Brandmueller cc: current@freebsd.org Subject: RE: Hot Swapping CPUs? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jan 2004 20:14:45 -0000 On Fri, 2 Jan 2004, Jeff Jirsa wrote: > [ I can't send to the list, since this location lacks RDNS, but feel > free to send followups to the list if you feel they're of use ] > > > Find me a x86 motherboard (with specs, preferably) that supports cpu > > failure-monitoring and hot-swapping and I'll volunteer time to hack up > > some code for you. (We have a need for this functionality in > > our x86-farm at work, so I'd get to do it on the clock. :) ) > > Most (all?) of the IBM eSeries servers have 'Predictive Failure > Analysis'... It claims to support real-time failure prediction, but I'm > relatively sure it's not even close to hot-swappable at the CPU level > (PCI-X cards hot-swap fine, though). You'll just have to figure out how > to tap into the ISMP the same way IBM Director Agent does to find out > when a CPU fails, and then a sysctl to disable that CPU would indeed be > a nice touch. Powering down a CPU and removing it from the available AP list on first sign of a problem would be a very nice start. It would prevent a hard lockup and let the system run until qualified support staff can arrive on site with a replacement part. Hot-swapping a CPU (or CPU board) as done on Sun Enterprise servers would be really nice but not crutial. As I see it, the problem that we're trying to address is the downtime between 3AM when you've realized that a CPU on your production online system has failed and 7AM when the system vendor's 4hr response team shows up. Powering down a system for a proc replacement causes a 5 minute downtime window which will still let you maintain 99.99904% availability (based on a 365.25 day year). > Specs? May be available, IBM loves cuddling up to the Linux community. I'll check IBM's ftp site for details. Dell's OpenManage Client lets one have access to the health information of a system. This includes voltages and speeds of processors, fans, memory, etc. I'll look into the availability of specs from Dell for this. If anyone has any docs on related material from HP or whomever and doesn't require me to sign an NDA to have access to it, now would be a good time to share them. :) Regards, > Andre Guibert de Bruet | Enterprise Software Consultant > > Silicon Landmark, LLC. | http://siliconlandmark.com/ >