From owner-freebsd-current  Mon Mar 19 12:36: 4 2001
Delivered-To: freebsd-current@freebsd.org
Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88])
	by hub.freebsd.org (Postfix) with ESMTP id D04A537B71C
	for <current@FreeBSD.org>; Mon, 19 Mar 2001 12:35:59 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f2JKZIG64428;
	Mon, 19 Mar 2001 12:35:18 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.010319123455.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <xzpg0g9x36w.fsf@flood.ping.uio.no>
Date: Mon, 19 Mar 2001 12:34:55 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Dag-Erling Smorgrav <des@ofug.org>
Subject: RE: Here's another one for you...
Cc: current@FreeBSD.org
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 19-Mar-01 Dag-Erling Smorgrav wrote:
> SMP box with a bleeding-edge -CURRENT kernel, patched to avoid the
> i586_bzero() problem:
> 
> panic: mutex_enter: recursion on non-recursive mutex process lock @
> ../../i386/i386/trap.c:854
> cpuid = 1; lapic.id = 01000000
> Debugger("panic")

That's a later symptom of a problem.  We recursed on the proc lock doing the
PHOLD before we handled the page fault.
 
> CPU1 stopping CPUs: 0x00000001... stopped.
> Stopped at      Debugger+0x45:  pushl   %ebx
> db> show mutex
>         "panic" (0xc030b1e0) locked at ../../kern/kern_shutdown.c:544
>         "process lock" (0xd3f15000) locked at ../../i386/i386/machdep.c:625

This is in sendsig():

        p = curproc;
        PROC_LOCK(p);
        psp = p->p_sigacts;
        if (SIGISMEMBER(psp->ps_osigset, sig)) {
                ...

>         "Giant" (0xc0309ac0) locked at ../../i386/i386/trap.c:1169
> db> trace
> Debugger(c027d5e1) at Debugger+0x45
> panic(c027c420,c027a154,c02997d0,356,d3f14ee0) at panic+0x144
> witness_enter(d3f15000,0,c02997d0,356) at witness_enter+0x355
> trap_pfault(d7345d4c,0,0) at trap_pfault+0x143
> trap(18,10,10,d7345fa8,0) at trap+0x978
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0, esp = 0xd7345d8c, ebp = 0xd7345ed8 ---
> (null)(805c3e0,e,d7345f10,0,4) at 0
> postsig(e) at postsig+0x40b

Hmmm.  An eip of 0 is bad.  This could be just another instance of the bzero
bug just in another place.  You probably want to change the code that actually
sets *bzero to i586_bzero (and same for any other ops that use floating point).
The code in question for this lies in i386/isa/npx.c.  It seems we use the fp
regs for copyin/copyout and bcopy as well.  I would just change line 458 of
npx.c to say '#ifdef I586_CPU_XXX' for now as your temporary patch (then you
don't need to patch pmap_zero_page() anymore.)

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message