From owner-freebsd-alpha Tue Mar 27 14:18: 1 2001 Delivered-To: freebsd-alpha@freebsd.org Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27]) by hub.freebsd.org (Postfix) with ESMTP id EF87737B718 for ; Tue, 27 Mar 2001 14:17:56 -0800 (PST) (envelope-from jeremyp@gsmx07.alcatel.com.au) Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1]) by netau1.alcanet.com.au (8.9.3 (PHNE_22672)/8.9.3) with ESMTP id IAA12698 for ; Wed, 28 Mar 2001 08:17:55 +1000 (EST) Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au (PMDF V5.2-32 #37645) with ESMTP id <01K1Q20FN5AOS4LUGY@cim.alcatel.com.au> for freebsd-alpha@FreeBSD.ORG; Wed, 28 Mar 2001 08:17:53 +1100 Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.11.1/8.11.1) id f2RMHpR47317 for freebsd-alpha@FreeBSD.ORG; Wed, 28 Mar 2001 08:17:51 +1000 (EST envelope-from jeremyp) Content-return: prohibited Date: Wed, 28 Mar 2001 08:16:31 +1000 From: Peter Jeremy Subject: Re: dump(8) In-reply-to: <15039.62193.909968.858131@grasshopper.cs.duke.edu>; from gallatin@cs.duke.edu on Mon, Mar 26, 2001 at 08:54:57PM -0500 To: Andrew Gallatin Cc: John Baldwin , obrien@FreeBSD.ORG Message-id: <20010328081631.V26138@gsmx07.alcatel.com.au> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-disposition: inline User-Agent: Mutt/1.2.5i References: <15039.59182.857539.804159@grasshopper.cs.duke.edu> <15039.62193.909968.858131@grasshopper.cs.duke.edu> Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 2001-Mar-26 20:54:57 -0500, Andrew Gallatin wrote: > >John Baldwin writes: > > > I haven't been able to figure it out yet.. > > > > Try turning preemption off (i.e. remove it from the kernel config). On the x86 > > side we've seen that non-preemption safe code can blow up in very bad ways. :( > >Heh. I've never been brave enough to turn it on in the first place. ;-) Same here. >0xfffffe0006823360 is in witness_exit(). And I think the RA is >probably in the sigcode. So my guess about signals seem to be right on >track, as does Peter Jeremy's guess about t12 getting clobbered: My initial investigations (without WITNESS) showed that the offending code had just executed "_mtx_exit(..., &Giant)". I might see if I can add some code near exception.s:Lnohae to trap to ddb if t12 points into the kernel rather than userland - there might still be enough evidence left to find where Giant was being released. >Its looking more and more like a stack smash somewhere scribbling over >p->p_md.md_tf, but I'm damned if I know where. It would be enough to set FRAME_FLAGS_SYSCALL in p->p_md.md_tf->tf_regs[FRAME_FLAGS] - but I can't see how that could occur either. If it is stack smashing, adding a chunk of dead space below the trapframe should remove the corruption (ie decrement sp by (say) 1024 in each XentFOO just before CALL(FOO)). This would mean boosting the size of the kernel stack as well. It's not consistent, so it smells to me like an interrupt window - but I can't see anything wrong in the exception entry code either. How about the following scenario: Returning from a normal syscall, after XentSys1, FRAME_FLAGS indicates only a short register restore is necessary. The code then calls ast() which finds a pending signal and sets up a return via the userland signal trampoline - saving the relevant address in p->p_md.md_tf->tf_regs[FRAME_T12] and clearing p->p_md.md_tf->tf_regs[FRAME_FLAGS]. When the code returns back into exception.s, it's committed to a short restore, leaving t12 pointing to the Giant unlock at the end of ast(). I haven't looked through this in detail so I may have missed the code which prevents this scenario. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message