Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 May 2006 23:10:21 +0200 (CEST)
From:      Juergen Lock <nox@jelal.kn-bremen.de>
To:        bde@zeta.org.au
Cc:        freebsd-emulation@freebsd.org
Subject:   Re: QEMU 0.8.1 and -kernel-kqemu: stalls with "npxdna: fpcurthread == curthread"
Message-ID:  <200605122110.k4CLALRc074542@saturn.kn-bremen.de>
In-Reply-To: <20060512101754.K65309@delplex.bde.org>
References:  <1147403789.1034.9.camel@localhost.eu.mscsoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In article <20060512101754.K65309@delplex.bde.org> you write:
>On Fri, 12 May 2006, Georg-W. Koltermann wrote:
>
>> I installed QEMU 0.8.1 and kqemu-kmod kqemu-kmod-1.3.0.p7 today on
>> 6.1-PRERELEASE from 6 March.  I tried running it with the
>> "-kernel-kqemu" option for increased performance.
>>
>> It did boot up my WinXP guest rather quickly, but then it soon got into
>> some kind of extreme busy state.  The load went up, the X display was
>> not refreshed any more, just the mouse still moved.  I could escape to
>> the console and found lots of
>>
>>        May 11 13:04:44 hunter kernel: npxdna: fpcurthread == curthread
>43 times
>>        ...
>>
>> messages.  I then had to kill qemu.
>>
>> It does run ok without the "-kernel-kqemu" option.  Any idea?
>
>1. This error should cause a panic instead of a printf.  An invariant has
>    been violated.  The panic was broken in rev.1.131 of npx.c.
>2. This error has been implemented before.  It was in the amd64
>    linux32_sysvec.c until rev.1.9 of that.  There it was caused by dubious
>    setting of CR0_TS.  The fix is dubious too:
>
>% Index: linux32_sysvec.c
>% ===================================================================
>% RCS file: /home/ncvs/src/sys/amd64/linux32/linux32_sysvec.c,v
>% retrieving revision 1.8
>% retrieving revision 1.9
>% diff -u -2 -r1.8 -r1.9
>% --- linux32_sysvec.c	29 Jul 2005 19:40:38 -0000	1.8
>% +++ linux32_sysvec.c	22 Sep 2005 15:46:21 -0000	1.9
>% @@ -807,4 +807,5 @@
>%  	regs->tf_rbx = ps_strings;
>%  	load_cr0(rcr0() | CR0_MP | CR0_TS);
>
>I don't know why exec_linux_setregs() hacks on cr0 directly.  The
>native setregs function (exec_setregs()) just needs to set CR0_TS, and
>it calls fpstate_drop() to do this, as it should since only fpstate_drop()
>knows exactly what needs to be done.  I think CR0_MP is already set
>here (it's set at boot time and should remain set).
>
>% +	fpstate_drop(td);
>
>The correct fix seems to be to add this and remove the direct hacking
>on cr0.  fpstate_drop() sets CR0_TS again, so setting it earlier has
>no effects except to waste time and implement a race.  It gives a
>window in which the invariant (CR0_TS set if and only if fpucurthread
>== NULL) is violated.  CR0_TS should be set together with the other
>things that fpstate_drop() does, with interrupts disabled like
>fpstate_drop() or at least in a critical section (*).  I think the
>race is harmless.  Suppose the above is interrupted in the race window.
>Then, since the kernel doesn't really use the FPU, and lazy context
>switching is not implemented, and only fpudna() checks the invariant,
>interrupt handling will at most just save the FPU state, clearing
>CR0_TS again in the process, at which point the invariant is restored.
>
>npxdna/fpudna() itself has a race checking the invariant.  The invariant
>is checked before disabling interrupts, so it may change underneath us.
>Then various bad things happen, but at worst they result in a panic not
>occuring or printing a wrong message.  My fix for this just disables
>interrupts while checking.  An interrupt gate could be used to avoid the
>(only some (?) of the harmless (?) race window between the trap occuring
>and reaching npxdna()/fpudna().
>
>A comment says that exec_linux_setregs() is copied from from ia32_setregs().
>The latter has the same dubious code as the above.  Somehow it was never
>missing the fpstate_drop().
>
>% 
>%  	/* Return via doreti so that we can change to a different %cs */
>
>This bogus comment is also in ia32_setregs().  It is missing in the
>native setregs().  C code cannot return via doreti and shouldn't know
>that the syscall will return via doreti; it's not just %cs that will
>be changed by doreti -- doreti will load all the registers that we've
>just set up.
>
>(*) The npx/fpu code hard-disables Interrupts mainly for historical
>reasons -- it was necessary or at least simplest for the old i386 code to
>hard-disable interrupts so as to avoid complications in the
>non-exception16 case).
>
>Maybe other emulators get this wrong similarly.

So you think kqemu is doing something wrong?  The problem is _k_qemu is
closed source and afaik the author doesnt use freebsd, the inner
workings of it are in a binary blob that gets linked into a kld and it
runs guest code (including kernel code with -kernel-kqemu) in kernel
space on the host cpu.  You can see the freebsd-specific parts in
/usr/ports/emulators/kqemu-kmod/work/kqemu-1.3.0pre7/kqemu-freebsd.c
(after doing make in the port's dir) - could this be patched there?

 Btw, kqemu on amd64 also causes lots of
	fpudna in kernel mode!
messages even when not using -kernel-kqemu (so that kqemu only runs
guest userland code in kernel space.)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200605122110.k4CLALRc074542>