Date: Mon, 14 Jan 2002 07:42:39 +1100 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Bruce Evans <bde@zeta.org.au> Cc: Terry Lambert <tlambert2@mindspring.com>, Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Daniel Eischen <eischen@pcnet1.pcnet.com>, Dan Eischen <eischen@vigrid.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc Message-ID: <20020114074238.S561@gsmx07.alcatel.com.au> In-Reply-To: <20020112205919.E5372-100000@gamplex.bde.org>; from bde@zeta.org.au on Sat, Jan 12, 2002 at 09:40:20PM %2B1100 References: <3C4001A3.5ECCAEB9@mindspring.com> <20020112205919.E5372-100000@gamplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2002-Jan-12 21:40:20 +1100, Bruce Evans <bde@zeta.org.au> wrote: >On Sat, 12 Jan 2002, Terry Lambert wrote: ... >> Assuming a 64 bit data path, then we are talking a minimum of >> 3 * 512/(64/8) * (16:1) or 3k (3076) clocks to save the damn FPU >> state off to main memory (a store in a loop is 3 clocks ignoring >> the setup and crap, right?). Add another 3k clocks to bring it >> back. >> >> Best case, God loves us, and we spill and restore from L1 >> without an IPI or an invalidation, and without starting the >> thread on a CPU other than the one where it was suspended, and >> all spills are to cacheable write-through pages. That's a 16 >> times speed increase because we get to ignore the bus speed >> differential, or 3 * 512/(65/8) * 2 = (6k/16) = 384 clocks. > >This seems to be off by a bit. Actual timing on an Athlon1600 >overclocked a little gives the following times for some crtical >parts of context switching for each iteration of instructions in >a loop (not counting 2 cycles of loop overhead): > >pushal; popal: 9 cycles >pushl %ds; popl %ds: 21 cycles >fxsave; fxrstor: 105 cycles >fnsave; frstor: 264 cycles I can think of a possible reason: The FPU knows when it has been used vs just having executed fninit. In the latter case, all it needs to save is "I've been initialised". Also the FPU architecture includes "used" flags associated with each register - possibly the f*save instructions don't flush unused registers. Do the above numbers change when you push real data into the FP registers? Also, how expensive is a DNA trap? Would it be cheaper overall to always load FPU context on a switch - this is more expensive for processes that don't use FP, but saves a DNA trap per context switch (assuming they use FP in that slice) for those that do. To add some further numbers, in December 1999, I did some measurements on FP switching by patching npx.c. This was on a PII-266 running then -current. (The original e-mail was sent to -arch on Mon, 20 Dec 1999 07:34:06 +1100 in a thread titled "Concrete plans for ucontext/ mcontext changes around 4.0" - I don't have the message-id available). ctxt DNA FP swtch traps swtch 1754982 281557 59753 build world and a few CVS operations [1] 79044 18811 10341 gnuplot and xv in parallel [2] 800 138 130 parallel FP-intensive progs [3]. In the above, `ctxt swtch' is the number of context switches counted via vm.stats.sys.v_swtch. `DNA traps' is the number of device not available traps registered and `FP swtch' is the number of DNA traps where the FP context loaded is different to that saved on the preceeding context switch. Notes: [1] Boot to single user, run 'make buildworld' inside script(1). The buildworld had a few hiccups along the way which I patched around and then re-ran 'make everything'. [2] I ran the gnuplot demos and the xv visual schnauzer updating a large directory of pictures in parallel. (Multi-user X11). [3] This was four parallel copies of a circuit analysis program I wrote. It spent most of its time solving a complex 26x26 matrix using Gaussian elimination. (Multi-user console). Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020114074238.S561>