Date: Thu, 1 May 2008 13:35:06 -0400 From: John Baldwin <jhb@freebsd.org> To: Juergen Lock <nox@jelal.kn-bremen.de> Cc: freebsd-emulation@freebsd.org, freebsd-amd64@freebsd.org Subject: Re: seems I finally found what upset kqemu on amd64 SMP... shared gdt! (please test patch :) Message-ID: <200805011335.06415.jhb@freebsd.org> In-Reply-To: <20080501155304.GB2940@saturn.kn-bremen.de> References: <20080429222458.GA20855@saturn.kn-bremen.de> <200805011011.06951.jhb@freebsd.org> <20080501155304.GB2940@saturn.kn-bremen.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 01 May 2008 11:53:04 am Juergen Lock wrote: > On Thu, May 01, 2008 at 10:11:06AM -0400, John Baldwin wrote: > > On Thursday 01 May 2008 06:19:51 am Juergen Lock wrote: > > > On Wed, Apr 30, 2008 at 12:24:58AM +0200, Juergen Lock wrote: > > > > Yeah, the amd64 kernel reuses the same gdt to setup all cpus, causing > > > > kqemu to end up restoring the interrupt stackpointer (after running > > > > guest code using its own cpu state) from the tss of the last cpu, > > > > regardless which cpu it happened to run on. And that then causes the > > > > last cpu's (usually) idle thread's stack to get smashed and the host > > > > doing multiple panics... (Which also explains why pinning qemu onto cpu > > > > 1 worked on a 2-way host.) > > > > > > Hmm maybe the following is a little more clear: kqemu sets up its own > > > cpu state and has to save and restore the original state because of that, > > > so among other things it does an str insn (store task register), and later > > > an ltr insn (load task register) using the value it got from the first > > > str insn. That ltr insn loads the selector for the tss which is stored > > > in the gdt, and that entry in the gdt is different for each cpu, but since > > > a single gdt was reused to setup the cpus at boot (in init_secondary() in > > > /sys/amd64/amd64/mp_machdep.c), it still points to the tss for the last > > > cpu, instead of to the right one for the cpu the ltr insn gets executed on. > > > That is what the kqemu_tss_workaround() in the patch `fixes'... > > > > Perhaps kqemu shouldn't be doing str/ltr on amd64 instead? The things i386 > > uses a separate tss for in the kernel (separate stack for double faults) is > > handled differently on amd64 (on amd64 we make the double fault handler use > > one of the IST stacks). > > Well, kqemu uses its own gdt, tss and everything while running guest code > in its monitor, so it kinda has to do the str/ltr.s to setup its stuff, run > guest code, and then restore the original state of things. (And `restore > original state of things' is what failed here.) > > Oh and also the tss does seem to be used for the interrupt stack on > amd64 too, at least thats the one that ended up wrong and caused the panics > I saw... The single TSS holds the IST pointers. On i386 we use a separate TSS for double faults, but on amd64 a double fault uses the same TSS but uses the IST pointers from that same TSS. The TSS also holds the ring stack pointer for when syscalls, interrupts, and traps from userland cross from ring 3 to ring 0 which is probably why you got a panic. Because of the fact that amd64 in normal operation never changes the task register (and that the gdt isn't used quite the same either, all the per-cpu stuff is via FSBASE and GSBASE) I don't expect the kernel to change to use a per-cpu gdt or the like. I think you will need to use the current approach of patching kqemu to fixup the tss/gdt when reloading the task register. You might want to make it a regular part of the code rather than a workaround as a result. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200805011335.06415.jhb>