Date: Wed, 28 Mar 2001 08:16:31 +1000 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: John Baldwin <jhb@FreeBSD.ORG>, obrien@FreeBSD.ORG Subject: Re: dump(8) Message-ID: <20010328081631.V26138@gsmx07.alcatel.com.au> In-Reply-To: <15039.62193.909968.858131@grasshopper.cs.duke.edu>; from gallatin@cs.duke.edu on Mon, Mar 26, 2001 at 08:54:57PM -0500 References: <15039.59182.857539.804159@grasshopper.cs.duke.edu> <XFMail.010326172439.jhb@FreeBSD.org> <15039.62193.909968.858131@grasshopper.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2001-Mar-26 20:54:57 -0500, Andrew Gallatin <gallatin@cs.duke.edu> wrote: > >John Baldwin writes: > > > I haven't been able to figure it out yet.. > > > > Try turning preemption off (i.e. remove it from the kernel config). On the x86 > > side we've seen that non-preemption safe code can blow up in very bad ways. :( > >Heh. I've never been brave enough to turn it on in the first place. ;-) Same here. >0xfffffe0006823360 is in witness_exit(). And I think the RA is >probably in the sigcode. So my guess about signals seem to be right on >track, as does Peter Jeremy's guess about t12 getting clobbered: My initial investigations (without WITNESS) showed that the offending code had just executed "_mtx_exit(..., &Giant)". I might see if I can add some code near exception.s:Lnohae to trap to ddb if t12 points into the kernel rather than userland - there might still be enough evidence left to find where Giant was being released. >Its looking more and more like a stack smash somewhere scribbling over >p->p_md.md_tf, but I'm damned if I know where. It would be enough to set FRAME_FLAGS_SYSCALL in p->p_md.md_tf->tf_regs[FRAME_FLAGS] - but I can't see how that could occur either. If it is stack smashing, adding a chunk of dead space below the trapframe should remove the corruption (ie decrement sp by (say) 1024 in each XentFOO just before CALL(FOO)). This would mean boosting the size of the kernel stack as well. It's not consistent, so it smells to me like an interrupt window - but I can't see anything wrong in the exception entry code either. How about the following scenario: Returning from a normal syscall, after XentSys1, FRAME_FLAGS indicates only a short register restore is necessary. The code then calls ast() which finds a pending signal and sets up a return via the userland signal trampoline - saving the relevant address in p->p_md.md_tf->tf_regs[FRAME_T12] and clearing p->p_md.md_tf->tf_regs[FRAME_FLAGS]. When the code returns back into exception.s, it's committed to a short restore, leaving t12 pointing to the Giant unlock at the end of ast(). I haven't looked through this in detail so I may have missed the code which prevents this scenario. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010328081631.V26138>