From owner-freebsd-net@FreeBSD.ORG Mon May 28 08:54:58 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2459616A469 for ; Mon, 28 May 2007 08:54:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id F184F13C489 for ; Mon, 28 May 2007 08:54:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 77E0A471DC; Mon, 28 May 2007 04:54:57 -0400 (EDT) Date: Mon, 28 May 2007 09:54:57 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Julian Elischer In-Reply-To: <465A90C8.8020600@elischer.org> Message-ID: <20070528095156.L2234@fledge.watson.org> References: <200705221006.49359.nvass@teledomenet.gr> <200705281033.08968.nvass@teledomenet.gr> <465A8B7B.7060204@elischer.org> <200705281104.22384.nvass@teledomenet.gr> <465A90C8.8020600@elischer.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Nikos Vassiliadis Subject: Re: debuging a hung kernel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 May 2007 08:54:58 -0000 On Mon, 28 May 2007, Julian Elischer wrote: > Nikos Vassiliadis wrote: >> On Monday 28 May 2007 10:57, Julian Elischer wrote: >>> Nikos Vassiliadis wrote: >>>> On Tuesday 22 May 2007 10:06, I wrote: >>>>> Hello everybody, >>>>> >>>>> I just managed to lock my box and I want to report it >>> define "lock"? >>> >>> Does it still respond to on the keyboard? >> >> No, but I was trying to break to the debugger with >> myself. I assume that it is >> equivalent to the combination you wrote, or not? >> >>> (Assuming you have the debugger in your kernel?). >> >> Yes, I have included my kernel configuration, see bellow. >> >>> Does it still ping? >> >> no, ARP does not work as well. > > nasty.. do you have IPMI? sometimes that allows you to generate an NMI that > could theoretically be made to drop to the debugger. > > I've not had success with that but I have heard others have. An increase number of server motherboards have an NMI button on the motherboard, possibly exposed outside the case, but generally not. I've not tested it in over a year, but a few years ago I added an MP_WATCHDOG kernel option that causes one of the CPUs in an SMP system to become a dedicated watchdog CPU, checking to see if the OS is alive enough to process timer tickets. If a counter isn't updated, it generates an NMI to the debugger from the watchdog CPU. The idea here is that, as the number of CPUs increases, the cost of dedicating a CPU for debugging stuff gets lower. However, there have been quite a few scheduler changes in the last few years, and it's possible that the watchdog no longer properly excludes other work from being scheduled, and that further work is required. In particular, I believe it relies on 4BSD's "pull" scheduling model and a lack of per-CPU workers, so the mechanism may require some rethinking. Robert N M Watson Computer Laboratory University of Cambridge