Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Dec 1999 07:34:06 +1100
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        Martin Cracauer <cracauer@cons.org>
Cc:        arch@freebsd.org
Subject:   Re: Concrete plans for ucontext/mcontext changes around 4.0
Message-ID:  <99Dec20.072529est.40328@border.alcanet.com.au>
In-Reply-To: <19991213091915.D13197@cons.org>; from cracauer@cons.org on Mon, Dec 13, 1999 at 07:19:16PM %2B1100
References:  <19991212172602.A10611@cons.org> <19991213091915.D13197@cons.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1999-Dec-13 19:19:16 +1100, Martin Cracauer <cracauer@cons.org> wrote:
>Forgot about lazy FPU context switching (should have finished reading
>my mailbox). FPU context is not always there.
>
>The Linux people claim that lazy FPU switching is not worth the effort
>anymore on modern machines. I didn't see any proof or numbers. Anyone
>of you?

Currently, the i386 half implements lazy FPU switching[1].  Based on
some experimenting over the weekend, I don't believe it is worthwhile
implementing full lazy FPU switching, but our semi-lazy switching is a
definite win.

I patched npx.c (patches at end) and extracted the following
statistics:

  ctxt     DNA    FP
 swtch    traps  swtch
1754982  281557  59753  build world and a few CVS operations [2]
  79044   18811  10341  gnuplot and xv in parallel [3]
    800     138    130  parallel FP-intensive progs [4].

In the above, `ctxt swtch' is the number of context switches counted
via vm.stats.sys.v_swtch.  `DNA traps' is the number of device not
available traps registered and `FP swtch' is the number of DNA traps
where the FP context loaded is different to that saved on the
preceeding context switch.

Moving to full lazy FPU switching would save (DNA traps - FP swtch)
fsave/frestor pairs and the same number of traps[5].  Whilst the real
savings incurred can't be directly derived from the above figures,
external knowledge of the real time taken for the above, together
with the estimated cost of a DNA trap + fsave, suggests a saving of
much less than 0.1% - which is getting towards the unmeasurable
level.  The best case would be a single, low priority FP-intensive
process combined with lots of I/O bound integer-only processes
(eg setiathome as an idle task) - which I don't have figures for,
but expect the overheads (for the FP process only) would be <1%.

The above figures do suggest that moving from the semi-lazy approach
to one where the FPU context was saved/restored on each context switch
would be wasteful - FP is not used about 80% of the time and fsave/
frestor are expensive instructions.

Notes:
[1] Currently, on the i386, the FP (NPX) registers are saved when a
    context switch occurs and the FPU had been used.  The NPX is then
    flagged as `not equipped', causing a Device Not Available (DNA)
    trap when the next FP instruction is executed.  At that point the
    appropriate FPU context is restored.  Full lazy switching would
    postpone the register save until an FP instruction was executed by
    a different process.

[2] Boot to single user, run 'make buildworld' inside script(1).  The
    buildworld had a few hiccups along the way which I patched around
    and then re-ran 'make everything'.

[3] I ran the gnuplot demos and the xv visual schnauzer updating a
    large directory of pictures in parallel.  (Multi-user X11).

[4] This was four parallel copies of a circuit analysis program I
    wrote.  It spent most of its time solving a complex 26x26 matrix
    using Gaussian elimination.  (Multi-user console).

[5] The trap saving would occur if the FPU enabled bit was set
    according to the contents of the FPU (ie the FPU is left as
    `enabled' when a context switch occurred into the process that
    last used the FPU, and `not enabled' otherwise).

Index: npx.c
===================================================================
RCS file: /home/peter/cvs/src/sys/i386/isa/npx.c,v
retrieving revision 1.78
diff -u -r1.78 npx.c
--- npx.c	1999/09/21 10:51:47	1.78
+++ npx.c	1999/12/17 09:53:02
@@ -779,6 +779,15 @@
 	}
 }
 
+static int	fp_dna;		/* number of DNA traps */
+static int	fp_swtch;	/* Number of real FP context switches */
+static struct proc *fpuproc;	/* Last proc to use FPU */
+
+SYSCTL_INT(_hw, OID_AUTO, fp_dna, CTLFLAG_RW, &fp_dna, 0,
+	"Number of NPX DNA traps");
+SYSCTL_INT(_hw, OID_AUTO, fp_swtch, CTLFLAG_RW, &fp_swtch, 0,
+	"Number of NPX context switches");
+
 /*
  * Implement device not available (DNA) exception
  *
@@ -797,6 +806,11 @@
 		panic("npxdna");
 	}
 	stop_emulating();
+	fp_dna++;
+	if (curproc != fpuproc) {
+		fpuproc = curproc;
+		fp_swtch++;
+	}
 	/*
 	 * Record new context early in case frstor causes an IRQ13.
 	 */


Peter
-- 
Peter Jeremy (VK2PJ)                    peter.jeremy@alcatel.com.au
Alcatel Australia Limited
41 Mandible St                          Phone: +61 2 9690 5019
ALEXANDRIA  NSW  2015                   Fax:   +61 2 9690 5982




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?99Dec20.072529est.40328>