From owner-freebsd-current  Tue Jul 16 18:39:50 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 52EEF37B400
	for <current@FreeBSD.ORG>; Tue, 16 Jul 2002 18:39:43 -0700 (PDT)
Received: from mail.speakeasy.net (mail16.speakeasy.net [216.254.0.216])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D76C343E4A
	for <current@FreeBSD.ORG>; Tue, 16 Jul 2002 18:39:42 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 16187 invoked from network); 17 Jul 2002 01:39:41 -0000
Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender <jhb@FreeBSD.org>)
          by mail16.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP
          for <gallatin@cs.duke.edu>; 17 Jul 2002 01:39:41 -0000
Received: from laptop.baldwin.cx (laptop.baldwin.cx [192.168.0.4])
	by server.baldwin.cx (8.11.6/8.11.6) with ESMTP id g6H1de054982;
	Tue, 16 Jul 2002 21:39:40 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.20020716213951.jhb@FreeBSD.org>
X-Mailer: XFMail 1.5.2 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <20020717103919.D3087-100000@gamplex.bde.org>
Date: Tue, 16 Jul 2002 21:39:51 -0400 (EDT)
From: John Baldwin <jhb@FreeBSD.org>
To: Bruce Evans <bde@zeta.org.au>
Subject: Re: VOP_GETATTR panic on Alpha
Cc: current@FreeBSD.ORG, Andrew Kolchoogin <andrew@snark.rinet.ru>,
	Andrew Gallatin <gallatin@cs.duke.edu>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG


On 17-Jul-2002 Bruce Evans wrote:
> This could also be just a driver problem.  I know the old wddump routine
> worked right but am not sure about any of the current ones.  Maybe dumps
> are broken on the alpha only due to driver problems.  Note that the
> splhigh() didn't actually lock out interrupts in RELENG_4 for drivers
> broken enough to call tsleep().  The [un]safepri hack in tsleep() may
> permit broken dump routines that call tsleep() to "work".  This hack
> has been lost in -current except for rotted comments which still say that
> it is done.

Agreed, if drivers depend on interrupts to work for dumps that is a Bug (tm).

>> On alpha, a random kernel thread is waking up, and is unable to go
>> back to sleep because of the panicstr hack msleep:
>>
>>         mtx_lock_spin(&sched_lock);
>>         if (cold || panicstr) {
>>                 /*
>>                  * After a panic, or during autoconfiguration,
>>                  * just give interrupts a chance, then just return;
>                           ^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This is the rotted comment.  No chance is given here.

Well, when you unlock sched_lock you give ithreads a chance to run.  (This
is only true in a fully preemptive kernel though.)

>>                  * don't run any other procs or panic below,
>>                  * in case this is the idle process and already asleep.
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Looks like more bitrot.  We've learned that the idle process can't call here.

Yes.

>>                  */
>>                 if (mtx != NULL && priority & PDROP)
>>                         mtx_unlock(mtx);
>>                 mtx_unlock_spin(&sched_lock);
> 
> The safepri hack (splx(safepri); splx(origpri);) was here instead of these
> mtx operations.

Probably to truly emulate this we should always release the 'mtx' mutex and
then reacquire it if PDROP isn't specified.

>>                 return (0);
>>         }
>>
>> We need to somehow let only interrupt threads and the panic'ed process
>> run after a panic.  I have no idea how to do this in a clean,
>> low-impact way.
> 
> I don't want to do this since I think there is no clean way to do it.
> But crash dumps must work without using interrupt threads, etc.  I
> think the "right" way to do the sync is to always do a crash dump and
> have fsck_*fs recover buffers from it rather than let the panicing
> kernel possibly create further damage.  But changing fsck_*fs to do
> this would be a lot of work.

I agree that this would be the best solution for the long term if we can
have it.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message