From owner-freebsd-alpha  Sat Mar 10 18:39:49 2001
Delivered-To: freebsd-alpha@freebsd.org
Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27])
	by hub.freebsd.org (Postfix) with ESMTP id 357BF37B71A
	for <freebsd-alpha@FreeBSD.ORG>; Sat, 10 Mar 2001 18:39:45 -0800 (PST)
	(envelope-from jeremyp@gsmx07.alcatel.com.au)
Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1])
	by netau1.alcanet.com.au (8.9.3 (PHNE_22672)/8.9.3) with ESMTP id NAA02602
	for <freebsd-alpha@FreeBSD.ORG>; Sun, 11 Mar 2001 13:39:41 +1100 (EDT)
Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au
 (PMDF V5.2-32 #37640) with ESMTP id <01K12MAJ890WOZ59LT@cim.alcatel.com.au>
 for freebsd-alpha@FreeBSD.ORG; Sun, 11 Mar 2001 13:39:41 +1100
Received: (from jeremyp@localhost)	by gsmx07.alcatel.com.au (8.11.1/8.11.1)
 id f2B2deE27385	for freebsd-alpha@FreeBSD.ORG; Sun,
 11 Mar 2001 13:39:40 +1100 (EST envelope-from jeremyp)
Content-return: prohibited
Date: Sun, 11 Mar 2001 13:39:39 +1100
From: Peter Jeremy <peter.jeremy@alcatel.com.au>
Subject: Re: ppp core-dumping in kernel space?
In-reply-to: <20010219074428.E70642@gsmx07.alcatel.com.au>; from
 peter.jeremy@alcatel.com.au on Mon, Feb 19, 2001 at 07:44:28AM +1100
To: freebsd-alpha@FreeBSD.ORG
Mail-Followup-To: freebsd-alpha@FreeBSD.ORG
Message-id: <20010311133939.A26976@gsmx07.alcatel.com.au>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-disposition: inline
User-Agent: Mutt/1.2.5i
References: <20010219074428.E70642@gsmx07.alcatel.com.au>
Sender: owner-freebsd-alpha@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On 2001-Feb-19 07:44:28 +1100, Peter Jeremy <peter.jeremy@alcatel.com.au> wrote:
>I'm running -current from 8th February on a Multia and ppp is regularly
>core-dumping (sigmentation violation).  The core dump contents seem
>consistent from the couple of different core's that I've studied.

I've done some more studying.  The problem is inside the sigreturn
trampoline (hence the ra on the stack).  The trampoline does a
"jsr ra,(t12)" to invoke the user signal handler.  My problem is
that somehow t12 is the kernel witness_exit() function, rather than
the user signal handler.  (And before I had WITNESS enabled, t12 was
_mtx_exit()).  A kernel address is obviously invalid in usermode, so a
 further SIGSEGV is delivered.  I notice that code in both locore.s and
machdep.c includes comments implying that the handler is in a3, rather
than t12, but the actual code in machdep,c has used t12 ever since dfr
imported in in June 1998.

In both core files I've checked, the kernel was in the process of
delivering a SIGALRM.

Since gcc seems to always compile function calls[1] as
	ldq	t12,...(gp)
	jsr	ra,(t12)
this suggests that there is a window between when the user registers
are restored and the actual retsys during which t12 (at least) can get
clobbered.  I've occasionally seen odd behaviour in other processes
which seems consistent with this behaviour.

Has anyone else seen this?  Is this likely to have been fixed by jhb's
recent locking fixes?

[1] Can anyone explain the rational behind this?  bsr should be
    preferable since the CPU can begin the instruction prefetch
    much earlier.

Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message