From owner-freebsd-current  Tue Jul 16 18: 5: 7 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1E3FC37B400
	for <current@FreeBSD.ORG>; Tue, 16 Jul 2002 18:05:04 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D994E43E31
	for <current@FreeBSD.ORG>; Tue, 16 Jul 2002 18:05:02 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id LAA23212;
	Wed, 17 Jul 2002 11:04:43 +1000
Date: Wed, 17 Jul 2002 11:08:24 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: Andrew Kolchoogin <andrew@snark.rinet.ru>, <current@FreeBSD.ORG>
Subject: Re: VOP_GETATTR panic on Alpha
In-Reply-To: <15668.23528.719956.574605@grasshopper.cs.duke.edu>
Message-ID: <20020717103919.D3087-100000@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Tue, 16 Jul 2002, Andrew Gallatin wrote:

> Andrew Kolchoogin writes:
>  > Why "panic" from debugger on i386 gives core dump and reboots the system
>  > and "panic" from debugger on Alpha does not?
>
> Because, as BDE says, that crashdumps work at all is mosty accidental.

Er, I meant that working of syncs in panic() is mostly accidental.
Panic dumps should not be affected, since they should involve little
more than the driver's dump routine which should not depend on interrupts
or context switching working.  Dump routines must use polling only, and
run with some sort of lock to prevent context switching.  splhigh() is
used in RELENG_4.  sched_lock should probably be used in -current, but
there seems to be only a (null) splhigh().

This could also be just a driver problem.  I know the old wddump routine
worked right but am not sure about any of the current ones.  Maybe dumps
are broken on the alpha only due to driver problems.  Note that the
splhigh() didn't actually lock out interrupts in RELENG_4 for drivers
broken enough to call tsleep().  The [un]safepri hack in tsleep() may
permit broken dump routines that call tsleep() to "work".  This hack
has been lost in -current except for rotted comments which still say that
it is done.

> On alpha, a random kernel thread is waking up, and is unable to go
> back to sleep because of the panicstr hack msleep:
>
>         mtx_lock_spin(&sched_lock);
>         if (cold || panicstr) {
>                 /*
>                  * After a panic, or during autoconfiguration,
>                  * just give interrupts a chance, then just return;
                          ^^^^^^^^^^^^^^^^^^^^^^^^

This is the rotted comment.  No chance is given here.

>                  * don't run any other procs or panic below,
>                  * in case this is the idle process and already asleep.
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Looks like more bitrot.  We've learned that the idle process can't call here.

>                  */
>                 if (mtx != NULL && priority & PDROP)
>                         mtx_unlock(mtx);
>                 mtx_unlock_spin(&sched_lock);

The safepri hack (splx(safepri); splx(origpri);) was here instead of these
mtx operations.

>                 return (0);
>         }
>
> We need to somehow let only interrupt threads and the panic'ed process
> run after a panic.  I have no idea how to do this in a clean,
> low-impact way.

I don't want to do this since I think there is no clean way to do it.
But crash dumps must work without using interrupt threads, etc.  I
think the "right" way to do the sync is to always do a crash dump and
have fsck_*fs recover buffers from it rather than let the panicing
kernel possibly create further damage.  But changing fsck_*fs to do
this would be a lot of work.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message