From owner-freebsd-current Fri Nov 17 10:30:36 2000 Delivered-To: freebsd-current@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 7B9D837B657 for ; Fri, 17 Nov 2000 10:30:31 -0800 (PST) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by pike.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id eAHITLB96209; Fri, 17 Nov 2000 10:29:21 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <2967.974461715@axl.fw.uunet.co.za> Date: Fri, 17 Nov 2000 10:30:02 -0800 (PST) From: John Baldwin To: Sheldon Hearn Subject: Re: CURRENT is freezing again ... Cc: current@FreeBSD.org, Valentin Chopov , Boris Popov , Soren Schmidt , "Steven E. Ames" , Alfred Perlstein Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 17-Nov-00 Sheldon Hearn wrote: > > > On Thu, 16 Nov 2000 10:42:51 PST, Alfred Perlstein wrote: > >> I would try a new kernel, and perhaps some collabaration with John >> to debug these problems rather than just complaining about the >> situation. I see at least two experianced developers in the CC >> list, there's no reason for these poor bug reports. > > The problem with a hard lock-up out of which you can't escape into the > debugger is that it makes meaningful bug reports impossible. My non-SMP > workstation has exhibited apparently arbitrary lock-ups since the advent > of SMPng. When I get a hard lock like this I usually try to see if I can reproduce it in single user mode. If I can, then I compile KTR into my kernel with the following options: KTR, KTR_EXTEND, KTR_COMPILE="0x3fffffff", KTR_MASK="(KTR_INTR|KTR_PROC)". Then I boot into single user (so I don't dirty filesystems), mount any needed fs's as read only if possible, and run the following command: # sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom And then stare at the tracing output on teh screen to see what the machine was doing when it hung. I.e., to see if it is still getting interrupts, and to see what process it died in, etc. > From my understanding, John's WITNESS code allows us to break into the > debugger from within interrupt context. If the lock-ups are happening > in there, then this may help us provide better bug reports. Err, not quite. It's BSD/OS's WITNESS code, and what the WITNESS code does is perform extra checks on mutex enter's and exit's to ensure that we aren't handling mutexes in such a way that a deadlock is possible. Thus, it verifies that you don't grab mutexes out of order, or that you don't grab sleep mutexes with interrupts disabled, etc. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message