From owner-freebsd-current Tue Nov 28 12:43:12 2000 Delivered-To: freebsd-current@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 839DA37B400 for ; Tue, 28 Nov 2000 12:43:09 -0800 (PST) Received: from laptop.baldwin.cx (john@dhcp246.osd.bsdi.com [204.216.28.246]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eASKgwC89620; Tue, 28 Nov 2000 12:42:58 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Tue, 28 Nov 2000 12:43:10 -0800 (PST) From: John Baldwin To: The Hermit Hacker Subject: RE: -current kernel hangs machine solid ... Cc: freebsd-current@FreeBSD.org Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 28-Nov-00 The Hermit Hacker wrote: > > Just tried to build a kernel based on sources from today, to enable > BREAK_TO_DEBUGGER so that I can try and get in and see where its hanging > ... the compile hung the machine solid. Even hitting the > 'numlock'/'capslock' on my keyboard generated no results ... It is spinning with interrupts disabled, probably due to holding a spinlock for far too long. Debugging this is not all that fun. :-P If you can rig up an NMI switch, you can use that to drop into ddb and then use 'x' to see who owns various mutexes (sched_lock and callout_mtx being the primary spin mutexes of concern). If you compile your kernel with WITNESS and MUTEX_DEBUG, then you can use 'x' to look at the sched_lock and callout_mtx mutex structures, find the pointer to the mtx_debug structure, and examine that to find the mtxd_file and mtxd_line members. Then you can look at those (x/s to look at the filename as a string) to find the filename and line number when the mutex was last acquired. Grr, except that this is broken for spin mutexes. If you are patient, you can try rigging up a serial console, compile KTR into your kernel as so: options KTR options KTR_EXTEND options KTR_COMPILE=(KTR_LOCK|KTR_PROC|KTR_INTR) Then when the machine has booted, log in via ssh or a tty other than the serial console and type the following: # sysctl -w debug.ktr_mask=0x1208 # sysctl -w debug.ktr_verbose=2 # while (1) do > make -j 16 buildworld > end Unfortunately, there is a chance the machine will die before it hangs due to exceeding the stack space. In that case, you can _try_ bumping UPAGES, but that didn't help on my test machines. :-/ However, if your machine doesn't blow up and die, then when it hangs, the KTR output dumped to the serial console (which you should probably log to a file via script or somesuch) will show what mutex was acquired and where it was acquired that is causing the hang. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message