From owner-freebsd-alpha Sat Mar 10 18:39:49 2001 Delivered-To: freebsd-alpha@freebsd.org Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27]) by hub.freebsd.org (Postfix) with ESMTP id 357BF37B71A for ; Sat, 10 Mar 2001 18:39:45 -0800 (PST) (envelope-from jeremyp@gsmx07.alcatel.com.au) Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1]) by netau1.alcanet.com.au (8.9.3 (PHNE_22672)/8.9.3) with ESMTP id NAA02602 for ; Sun, 11 Mar 2001 13:39:41 +1100 (EDT) Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au (PMDF V5.2-32 #37640) with ESMTP id <01K12MAJ890WOZ59LT@cim.alcatel.com.au> for freebsd-alpha@FreeBSD.ORG; Sun, 11 Mar 2001 13:39:41 +1100 Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.11.1/8.11.1) id f2B2deE27385 for freebsd-alpha@FreeBSD.ORG; Sun, 11 Mar 2001 13:39:40 +1100 (EST envelope-from jeremyp) Content-return: prohibited Date: Sun, 11 Mar 2001 13:39:39 +1100 From: Peter Jeremy Subject: Re: ppp core-dumping in kernel space? In-reply-to: <20010219074428.E70642@gsmx07.alcatel.com.au>; from peter.jeremy@alcatel.com.au on Mon, Feb 19, 2001 at 07:44:28AM +1100 To: freebsd-alpha@FreeBSD.ORG Mail-Followup-To: freebsd-alpha@FreeBSD.ORG Message-id: <20010311133939.A26976@gsmx07.alcatel.com.au> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-disposition: inline User-Agent: Mutt/1.2.5i References: <20010219074428.E70642@gsmx07.alcatel.com.au> Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 2001-Feb-19 07:44:28 +1100, Peter Jeremy wrote: >I'm running -current from 8th February on a Multia and ppp is regularly >core-dumping (sigmentation violation). The core dump contents seem >consistent from the couple of different core's that I've studied. I've done some more studying. The problem is inside the sigreturn trampoline (hence the ra on the stack). The trampoline does a "jsr ra,(t12)" to invoke the user signal handler. My problem is that somehow t12 is the kernel witness_exit() function, rather than the user signal handler. (And before I had WITNESS enabled, t12 was _mtx_exit()). A kernel address is obviously invalid in usermode, so a further SIGSEGV is delivered. I notice that code in both locore.s and machdep.c includes comments implying that the handler is in a3, rather than t12, but the actual code in machdep,c has used t12 ever since dfr imported in in June 1998. In both core files I've checked, the kernel was in the process of delivering a SIGALRM. Since gcc seems to always compile function calls[1] as ldq t12,...(gp) jsr ra,(t12) this suggests that there is a window between when the user registers are restored and the actual retsys during which t12 (at least) can get clobbered. I've occasionally seen odd behaviour in other processes which seems consistent with this behaviour. Has anyone else seen this? Is this likely to have been fixed by jhb's recent locking fixes? [1] Can anyone explain the rational behind this? bsr should be preferable since the CPU can begin the instruction prefetch much earlier. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message