From owner-freebsd-current@FreeBSD.ORG Thu Apr 8 15:22:10 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D97716A4CE for ; Thu, 8 Apr 2004 15:22:10 -0700 (PDT) Received: from mail001.syd.optusnet.com.au (mail001.syd.optusnet.com.au [211.29.132.142]) by mx1.FreeBSD.org (Postfix) with ESMTP id 987F243D1F for ; Thu, 8 Apr 2004 15:22:07 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from server.vk2pj.dyndns.org (c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229]) i38MM1o32606; Fri, 9 Apr 2004 08:22:01 +1000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])i38MM1Ru008049; Fri, 9 Apr 2004 08:22:01 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.12.10/8.12.10/Submit) id i38MM0br008048; Fri, 9 Apr 2004 08:22:00 +1000 (EST) (envelope-from peter) Date: Fri, 9 Apr 2004 08:22:00 +1000 From: Peter Jeremy To: ticso@cicely.de Message-ID: <20040408222200.GD6458@server.vk2pj.dyndns.org> References: <20040408091030.GA6458@server.vk2pj.dyndns.org> <40751A74.50504@freebsd.org> <20040408114441.GB6458@server.vk2pj.dyndns.org> <20040408142742.GD5279@cicely12.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040408142742.GD5279@cicely12.cicely.de> User-Agent: Mutt/1.4.2.1i cc: current@freebsd.org Subject: Re: panic on one cpu leaves others running... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Apr 2004 22:22:10 -0000 On Thu, Apr 08, 2004 at 04:27:43PM +0200, Bernd Walter wrote: >On Thu, Apr 08, 2004 at 09:44:41PM +1000, Peter Jeremy wrote: >> > A panic usually means that >> >something unrecoverable happened, and that continuing on is not safe. >> >> I realise that. Hence actually being able to continue after a panic >> would be extremely difficult to do safely. (Probably not possible in >> general, though it might be in some special cases). > >If it's save to continue then there's no need to panic at all. >Just stoping the faulting parts would be enough in that case. Except FreeBSD (and most Unices) don't do this in general. I was thinking of hardware failures - if a CPU fails and it wasn't holding any locks then it would seem feasible to just abort the thread/process that was using the CPU and limp along on the remaining CPU(s). Likewise an unrecoverable memory error in a clean page should (in most cases) be able to be recovered by marking that page unusable and loading another copy of the data into another page. (Obviously this is problematic if the page in question is part of the kernel VM subsystem or the device driver for the relevant backing store). Even a dirty page may be recoverable by aborting the affected process or treating it similarly to an I/O error on a filesystem. The marketing spin from at least one vendor suggests that their high-end systems can manage this sort of fault recovery. I'm not sure whether this is an area that FreeBSD should aspire to - I suspect that the effort needed to implement and test this would not be justified by the small size of the additional potential market. Peter