Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Nov 2000 10:30:02 -0800 (PST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Sheldon Hearn <sheldonh@uunet.co.za>
Cc:        current@FreeBSD.org, Valentin Chopov <valentin@valcho.net>, Boris Popov <bp@butya.kz>, Soren Schmidt <sos@freebsd.dk>, "Steven E. Ames" <steve@virtual-voodoo.com>, Alfred Perlstein <bright@wintelcom.net>
Subject:   Re: CURRENT is freezing again ...
Message-ID:  <XFMail.001117103002.jhb@FreeBSD.org>
In-Reply-To: <2967.974461715@axl.fw.uunet.co.za>

next in thread | previous in thread | raw e-mail | index | archive | help

On 17-Nov-00 Sheldon Hearn wrote:
> 
> 
> On Thu, 16 Nov 2000 10:42:51 PST, Alfred Perlstein wrote:
> 
>> I would try a new kernel, and perhaps some collabaration with John
>> to debug these problems rather than just complaining about the
>> situation.  I see at least two experianced developers in the CC
>> list, there's no reason for these poor bug reports.
> 
> The problem with a hard lock-up out of which you can't escape into the
> debugger is that it makes meaningful bug reports impossible.  My non-SMP
> workstation has exhibited apparently arbitrary lock-ups since the advent
> of SMPng.

When I get a hard lock like this I usually try to see if I can reproduce it in
single user mode.  If I can, then I compile KTR into my kernel with the
following options:  KTR, KTR_EXTEND, KTR_COMPILE="0x3fffffff",
KTR_MASK="(KTR_INTR|KTR_PROC)".  Then I boot into single user (so I don't dirty
filesystems), mount any needed fs's as read only if possible, and run the
following command:

# sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom

And then stare at the tracing output on teh screen to see what the machine
was doing when it hung.  I.e., to see if it is still getting interrupts, and to
see what process it died in, etc.

> From my understanding, John's WITNESS code allows us to break into the
> debugger from within interrupt context.  If the lock-ups are happening
> in there, then this may help us provide better bug reports.

Err, not quite.  It's BSD/OS's WITNESS code, and what the WITNESS code does is
perform extra checks on mutex enter's and exit's to ensure that we aren't
handling mutexes in such a way that a deadlock is possible.  Thus, it verifies
that you don't grab mutexes out of order, or that you don't grab sleep mutexes
with interrupts disabled, etc.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.001117103002.jhb>