From owner-freebsd-alpha  Tue Mar 27 14:18: 1 2001
Delivered-To: freebsd-alpha@freebsd.org
Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27])
	by hub.freebsd.org (Postfix) with ESMTP id EF87737B718
	for <freebsd-alpha@FreeBSD.ORG>; Tue, 27 Mar 2001 14:17:56 -0800 (PST)
	(envelope-from jeremyp@gsmx07.alcatel.com.au)
Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1])
	by netau1.alcanet.com.au (8.9.3 (PHNE_22672)/8.9.3) with ESMTP id IAA12698
	for <freebsd-alpha@FreeBSD.ORG>; Wed, 28 Mar 2001 08:17:55 +1000 (EST)
Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au
 (PMDF V5.2-32 #37645) with ESMTP id <01K1Q20FN5AOS4LUGY@cim.alcatel.com.au>
 for freebsd-alpha@FreeBSD.ORG; Wed, 28 Mar 2001 08:17:53 +1100
Received: (from jeremyp@localhost)	by gsmx07.alcatel.com.au (8.11.1/8.11.1)
 id f2RMHpR47317	for freebsd-alpha@FreeBSD.ORG; Wed,
 28 Mar 2001 08:17:51 +1000 (EST envelope-from jeremyp)
Content-return: prohibited
Date: Wed, 28 Mar 2001 08:16:31 +1000
From: Peter Jeremy <peter.jeremy@alcatel.com.au>
Subject: Re: dump(8)
In-reply-to: <15039.62193.909968.858131@grasshopper.cs.duke.edu>; from
 gallatin@cs.duke.edu on Mon, Mar 26, 2001 at 08:54:57PM -0500
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: John Baldwin <jhb@FreeBSD.ORG>, obrien@FreeBSD.ORG
Message-id: <20010328081631.V26138@gsmx07.alcatel.com.au>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-disposition: inline
User-Agent: Mutt/1.2.5i
References: <15039.59182.857539.804159@grasshopper.cs.duke.edu>
 <XFMail.010326172439.jhb@FreeBSD.org>
 <15039.62193.909968.858131@grasshopper.cs.duke.edu>
Sender: owner-freebsd-alpha@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On 2001-Mar-26 20:54:57 -0500, Andrew Gallatin <gallatin@cs.duke.edu> wrote:
>
>John Baldwin writes:
> > > I haven't been able to figure it out yet..
> > 
> > Try turning preemption off (i.e. remove it from the kernel config).  On the x86
> > side we've seen that non-preemption safe code can blow up in very bad ways. :(
>
>Heh.  I've never been brave enough to turn it on in the first place.  ;-)

Same here.

>0xfffffe0006823360 is in witness_exit(). And I think the RA is
>probably in the sigcode. So my guess about signals seem to be right on
>track, as does Peter Jeremy's guess about t12 getting clobbered:

My initial investigations (without WITNESS) showed that the offending
code had just executed "_mtx_exit(..., &Giant)".  I might see if I can
add some code near exception.s:Lnohae to trap to ddb if t12 points
into the kernel rather than userland - there might still be enough
evidence left to find where Giant was being released.

>Its looking more and more like a stack smash somewhere scribbling over
>p->p_md.md_tf, but I'm damned if I know where.

It would be enough to set FRAME_FLAGS_SYSCALL in
p->p_md.md_tf->tf_regs[FRAME_FLAGS] - but I can't see how that could
occur either.  If it is stack smashing, adding a chunk of dead space
below the trapframe should remove the corruption (ie decrement sp
by (say) 1024 in each XentFOO just before CALL(FOO)).  This would
mean boosting the size of the kernel stack as well.

It's not consistent, so it smells to me like an interrupt window -
but I can't see anything wrong in the exception entry code either.

How about the following scenario: Returning from a normal syscall,
after XentSys1, FRAME_FLAGS indicates only a short register restore is
necessary.  The code then calls ast() which finds a pending signal and
sets up a return via the userland signal trampoline - saving the
relevant address in p->p_md.md_tf->tf_regs[FRAME_T12] and clearing
p->p_md.md_tf->tf_regs[FRAME_FLAGS].  When the code returns back into
exception.s, it's committed to a short restore, leaving t12 pointing
to the Giant unlock at the end of ast().  I haven't looked through
this in detail so I may have missed the code which prevents this
scenario.

Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message