From owner-freebsd-current@FreeBSD.ORG Mon Oct 11 15:12:12 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D7EC016A4CE for ; Mon, 11 Oct 2004 15:12:12 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8903543D49 for ; Mon, 11 Oct 2004 15:12:12 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i9BFAdws049120; Mon, 11 Oct 2004 11:10:39 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i9BFAdvw049117; Mon, 11 Oct 2004 11:10:39 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Mon, 11 Oct 2004 11:10:39 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Volker In-Reply-To: <416A97E1.70603@vwsoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: beta6/7 machine freeze X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2004 15:12:13 -0000 On Mon, 11 Oct 2004, Volker wrote: > Since using that test setup the server machine didn't had a lockup. So > my guess might be probably right to have a problem somewhere at i4b or > user-ppp in releng_5. > > How does one debug a dead machine? Is there any way to get a backtrace > or call the debugger while the machine has been frozen? There's a pretty useful chapter in the handbook on getting setup to debug. For debugging hangs, the first thing I'll generally do is get the box set up with a serial console and see if I can get into the debugger by sending a serial break (the break on the actualy console might also work, but is sometimes less reliable when there's a hang). You will want to compile at least options KDB, DDB, and BREAK_TO_DEBUGGER into the kernel. On the real console, Ctrl-Alt-Escape is the break sequence; on the serial console, it's up to your client: for tip and cu, it's ~#, I believe. Assuming you can get into the debugger, the first thing you'll want to do is use "show pcpu" and "trace" to tell you a little about the active thread. You might "continue", then drop in a gain a few seconds later and see if the trace looks the same or similar. If you can't get into the debugger using the typicaly responses are: you really want to get into the debugger, so arrange an NMI or watchdog, or, you give up on getting into the debugger and try various things to see if removing one causes the hang to go away. I have some test hardware with a conveniently placed NMI button on the back -- you press the button and it generates a non-maskable interrupt which almost always works. On other hardware, you can play nasty hardware tricks. If you have SMP, you might try the MP_WATCHDOG kernel option and sysctls (under-documented). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research