Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 May 2006 11:26:17 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        "Georg-W. Koltermann" <gwk-fbsd@mnet-mail.de>
Cc:        freebsd-emulation@freebsd.org, peter@freebsd.org
Subject:   Re: QEMU 0.8.1 and -kernel-kqemu: stalls with "npxdna: fpcurthread == curthread"
Message-ID:  <20060512101754.K65309@delplex.bde.org>
In-Reply-To: <1147403789.1034.9.camel@localhost.eu.mscsoftware.com>
References:  <1147403789.1034.9.camel@localhost.eu.mscsoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 12 May 2006, Georg-W. Koltermann wrote:

> I installed QEMU 0.8.1 and kqemu-kmod kqemu-kmod-1.3.0.p7 today on
> 6.1-PRERELEASE from 6 March.  I tried running it with the
> "-kernel-kqemu" option for increased performance.
>
> It did boot up my WinXP guest rather quickly, but then it soon got into
> some kind of extreme busy state.  The load went up, the X display was
> not refreshed any more, just the mouse still moved.  I could escape to
> the console and found lots of
>
>        May 11 13:04:44 hunter kernel: npxdna: fpcurthread == curthread 43 times
>        ...
>
> messages.  I then had to kill qemu.
>
> It does run ok without the "-kernel-kqemu" option.  Any idea?

1. This error should cause a panic instead of a printf.  An invariant has
    been violated.  The panic was broken in rev.1.131 of npx.c.
2. This error has been implemented before.  It was in the amd64
    linux32_sysvec.c until rev.1.9 of that.  There it was caused by dubious
    setting of CR0_TS.  The fix is dubious too:

% Index: linux32_sysvec.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/amd64/linux32/linux32_sysvec.c,v
% retrieving revision 1.8
% retrieving revision 1.9
% diff -u -2 -r1.8 -r1.9
% --- linux32_sysvec.c	29 Jul 2005 19:40:38 -0000	1.8
% +++ linux32_sysvec.c	22 Sep 2005 15:46:21 -0000	1.9
% @@ -807,4 +807,5 @@
%  	regs->tf_rbx = ps_strings;
%  	load_cr0(rcr0() | CR0_MP | CR0_TS);

I don't know why exec_linux_setregs() hacks on cr0 directly.  The
native setregs function (exec_setregs()) just needs to set CR0_TS, and
it calls fpstate_drop() to do this, as it should since only fpstate_drop()
knows exactly what needs to be done.  I think CR0_MP is already set
here (it's set at boot time and should remain set).

% +	fpstate_drop(td);

The correct fix seems to be to add this and remove the direct hacking
on cr0.  fpstate_drop() sets CR0_TS again, so setting it earlier has
no effects except to waste time and implement a race.  It gives a
window in which the invariant (CR0_TS set if and only if fpucurthread
== NULL) is violated.  CR0_TS should be set together with the other
things that fpstate_drop() does, with interrupts disabled like
fpstate_drop() or at least in a critical section (*).  I think the
race is harmless.  Suppose the above is interrupted in the race window.
Then, since the kernel doesn't really use the FPU, and lazy context
switching is not implemented, and only fpudna() checks the invariant,
interrupt handling will at most just save the FPU state, clearing
CR0_TS again in the process, at which point the invariant is restored.

npxdna/fpudna() itself has a race checking the invariant.  The invariant
is checked before disabling interrupts, so it may change underneath us.
Then various bad things happen, but at worst they result in a panic not
occuring or printing a wrong message.  My fix for this just disables
interrupts while checking.  An interrupt gate could be used to avoid the
(only some (?) of the harmless (?) race window between the trap occuring
and reaching npxdna()/fpudna().

A comment says that exec_linux_setregs() is copied from from ia32_setregs().
The latter has the same dubious code as the above.  Somehow it was never
missing the fpstate_drop().

% 
%  	/* Return via doreti so that we can change to a different %cs */

This bogus comment is also in ia32_setregs().  It is missing in the
native setregs().  C code cannot return via doreti and shouldn't know
that the syscall will return via doreti; it's not just %cs that will
be changed by doreti -- doreti will load all the registers that we've
just set up.

(*) The npx/fpu code hard-disables Interrupts mainly for historical
reasons -- it was necessary or at least simplest for the old i386 code to
hard-disable interrupts so as to avoid complications in the
non-exception16 case).

Maybe other emulators get this wrong similarly.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060512101754.K65309>