From owner-freebsd-net@FreeBSD.ORG Mon May 28 10:29:36 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 96C6916A473; Mon, 28 May 2007 10:29:36 +0000 (UTC) (envelope-from nvass@teledomenet.gr) Received: from wmail.teledomenet.gr (wmail.teledomenet.gr [213.142.128.16]) by mx1.freebsd.org (Postfix) with ESMTP id 04C6613C484; Mon, 28 May 2007 10:29:35 +0000 (UTC) (envelope-from nvass@teledomenet.gr) Received: from iris (unknown [192.168.1.71]) by wmail.teledomenet.gr (Postfix) with ESMTP id D3F7C1C8627; Mon, 28 May 2007 13:29:34 +0300 (EEST) From: Nikos Vassiliadis To: Robert Watson Date: Mon, 28 May 2007 13:26:15 +0300 User-Agent: KMail/1.9.1 References: <200705221006.49359.nvass@teledomenet.gr> <465A90C8.8020600@elischer.org> <20070528095156.L2234@fledge.watson.org> In-Reply-To: <20070528095156.L2234@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200705281326.17517.nvass@teledomenet.gr> Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, Julian Elischer Subject: Re: debuging a hung kernel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 May 2007 10:29:36 -0000 Redirecting from @net to @stable. Please, remove @net from future mails. On Monday 28 May 2007 11:54, Robert Watson wrote: > On Mon, 28 May 2007, Julian Elischer wrote: > > Nikos Vassiliadis wrote: > >> On Monday 28 May 2007 10:57, Julian Elischer wrote: > >>> Nikos Vassiliadis wrote: > >>>> On Tuesday 22 May 2007 10:06, I wrote: > >>>>> Hello everybody, > >>>>> > >>>>> I just managed to lock my box and I want to report it > >>> > >>> define "lock"? > >>> > >>> Does it still respond to on the keyboard? > >> > >> No, but I was trying to break to the debugger with > >> myself. I assume that it is > >> equivalent to the combination you wrote, or not? > >> > >>> (Assuming you have the debugger in your kernel?). > >> > >> Yes, I have included my kernel configuration, see bellow. > >> > >>> Does it still ping? > >> > >> no, ARP does not work as well. > > > > nasty.. do you have IPMI? sometimes that allows you to generate an NMI > > that could theoretically be made to drop to the debugger. I have a Dell PowerEdge 750 sitting at work, which I think has IPMI. I'll be able to try a few things next week, since I will be off work for this week. > > > > I've not had success with that but I have heard others have. > > An increase number of server motherboards have an NMI button on the > motherboard, possibly exposed outside the case, but generally not. > > I've not tested it in over a year, but a few years ago I added an > MP_WATCHDOG kernel option that causes one of the CPUs in an SMP system > to become a dedicated watchdog CPU, checking to see if the OS is alive > enough to process timer tickets. If a counter isn't updated, it > generates an NMI to the debugger from the watchdog CPU. The idea here > is that, as the number of CPUs increases, the cost of dedicating a CPU > for debugging stuff gets lower. However, there have been quite a few > scheduler changes in the last few years, and it's possible that the > watchdog no longer properly excludes other work from being scheduled, > and that further work is required. In particular, I believe it relies > on 4BSD's "pull" scheduling model and a lack of per-CPU workers, so the > mechanism may require some rethinking. Unfortunately, I have not an SMP system available. Is there a mechanism which I can use to schedule a break to the debugger after n seconds or events? I am looking if ichwd(4) can help, though that needs investigation since I have not used watchdog facilities before. Thanks Julian & Robert.