Date: Mon, 20 Dec 1999 07:34:06 +1100 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Martin Cracauer <cracauer@cons.org> Cc: arch@freebsd.org Subject: Re: Concrete plans for ucontext/mcontext changes around 4.0 Message-ID: <99Dec20.072529est.40328@border.alcanet.com.au> In-Reply-To: <19991213091915.D13197@cons.org>; from cracauer@cons.org on Mon, Dec 13, 1999 at 07:19:16PM %2B1100 References: <19991212172602.A10611@cons.org> <19991213091915.D13197@cons.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 1999-Dec-13 19:19:16 +1100, Martin Cracauer <cracauer@cons.org> wrote:
>Forgot about lazy FPU context switching (should have finished reading
>my mailbox). FPU context is not always there.
>
>The Linux people claim that lazy FPU switching is not worth the effort
>anymore on modern machines. I didn't see any proof or numbers. Anyone
>of you?
Currently, the i386 half implements lazy FPU switching[1]. Based on
some experimenting over the weekend, I don't believe it is worthwhile
implementing full lazy FPU switching, but our semi-lazy switching is a
definite win.
I patched npx.c (patches at end) and extracted the following
statistics:
ctxt DNA FP
swtch traps swtch
1754982 281557 59753 build world and a few CVS operations [2]
79044 18811 10341 gnuplot and xv in parallel [3]
800 138 130 parallel FP-intensive progs [4].
In the above, `ctxt swtch' is the number of context switches counted
via vm.stats.sys.v_swtch. `DNA traps' is the number of device not
available traps registered and `FP swtch' is the number of DNA traps
where the FP context loaded is different to that saved on the
preceeding context switch.
Moving to full lazy FPU switching would save (DNA traps - FP swtch)
fsave/frestor pairs and the same number of traps[5]. Whilst the real
savings incurred can't be directly derived from the above figures,
external knowledge of the real time taken for the above, together
with the estimated cost of a DNA trap + fsave, suggests a saving of
much less than 0.1% - which is getting towards the unmeasurable
level. The best case would be a single, low priority FP-intensive
process combined with lots of I/O bound integer-only processes
(eg setiathome as an idle task) - which I don't have figures for,
but expect the overheads (for the FP process only) would be <1%.
The above figures do suggest that moving from the semi-lazy approach
to one where the FPU context was saved/restored on each context switch
would be wasteful - FP is not used about 80% of the time and fsave/
frestor are expensive instructions.
Notes:
[1] Currently, on the i386, the FP (NPX) registers are saved when a
context switch occurs and the FPU had been used. The NPX is then
flagged as `not equipped', causing a Device Not Available (DNA)
trap when the next FP instruction is executed. At that point the
appropriate FPU context is restored. Full lazy switching would
postpone the register save until an FP instruction was executed by
a different process.
[2] Boot to single user, run 'make buildworld' inside script(1). The
buildworld had a few hiccups along the way which I patched around
and then re-ran 'make everything'.
[3] I ran the gnuplot demos and the xv visual schnauzer updating a
large directory of pictures in parallel. (Multi-user X11).
[4] This was four parallel copies of a circuit analysis program I
wrote. It spent most of its time solving a complex 26x26 matrix
using Gaussian elimination. (Multi-user console).
[5] The trap saving would occur if the FPU enabled bit was set
according to the contents of the FPU (ie the FPU is left as
`enabled' when a context switch occurred into the process that
last used the FPU, and `not enabled' otherwise).
Index: npx.c
===================================================================
RCS file: /home/peter/cvs/src/sys/i386/isa/npx.c,v
retrieving revision 1.78
diff -u -r1.78 npx.c
--- npx.c 1999/09/21 10:51:47 1.78
+++ npx.c 1999/12/17 09:53:02
@@ -779,6 +779,15 @@
}
}
+static int fp_dna; /* number of DNA traps */
+static int fp_swtch; /* Number of real FP context switches */
+static struct proc *fpuproc; /* Last proc to use FPU */
+
+SYSCTL_INT(_hw, OID_AUTO, fp_dna, CTLFLAG_RW, &fp_dna, 0,
+ "Number of NPX DNA traps");
+SYSCTL_INT(_hw, OID_AUTO, fp_swtch, CTLFLAG_RW, &fp_swtch, 0,
+ "Number of NPX context switches");
+
/*
* Implement device not available (DNA) exception
*
@@ -797,6 +806,11 @@
panic("npxdna");
}
stop_emulating();
+ fp_dna++;
+ if (curproc != fpuproc) {
+ fpuproc = curproc;
+ fp_swtch++;
+ }
/*
* Record new context early in case frstor causes an IRQ13.
*/
Peter
--
Peter Jeremy (VK2PJ) peter.jeremy@alcatel.com.au
Alcatel Australia Limited
41 Mandible St Phone: +61 2 9690 5019
ALEXANDRIA NSW 2015 Fax: +61 2 9690 5982
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?99Dec20.072529est.40328>
