Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Apr 1995 03:10:17 -0400
From:      "Charles M. Hannum" <mycroft@ai.mit.edu>
To:        terry@cs.weber.edu
Cc:        toor@jsdinc.root.com, geli.com!rcarter@implode.root.com, hackers@FreeBSD.org, jkh@violet.berkeley.edu
Subject:   Re: benchmark hell..
Message-ID:  <199504250710.DAA08689@duality.gnu.ai.mit.edu>
In-Reply-To: <9504242008.AA19390@cs.weber.edu> (terry@cs.weber.edu)

next in thread | previous in thread | raw e-mail | index | archive | help

   The BSD model
   for the actual switch itself is very close to the UnixWare/Solaris model,
   but is missing delayed storage of the FPU registers on a switch.

Speak for yourself, please.  NetBSD has been doing delayed switching
for a while now.

   It should be pretty obvious that for a benchmark, when there is a single
   program doing FPU crap, that the FPU delayed switchout means no switch
   actually occurs during the running of the benchmark.

That depends on what you mean by a `switch'.  You still have to set
the `task switched' flag on a switch out, and take a fault next time
something uses the FPU.  You could recognize that the process you're
switching into was the last one using the FPU and automatically turn
off the flag when switching back in.  Currently, I don't do that,
though I may change that RSN.

   On hardware that does proper
   exception handling (like the Pentiums tested), the FPU context can be
   thrown out to the process it belongs to after being delayed over several
   context switches previous on the basis of "uses FPU" being set in the
   process or not, and a soft interrupt of the FPU as if trapping to an
   emulator to tag the first reference in each process.

You only need a `FPU was used' flag per process if you're doing
delayed initialization.  For primarily integer-only applications,
delaying the FPU initialization would be a win (albeit a *very* small
one), but it would require an extra test in the FPU context switching
code, which would be a smaller but frequent hit for FPU-intensive
applications.

(With a slight kluge, it may be possible to combine this with the
check for whether you're using the emulator or not, and eliminate the
extra bit altogether.  I'll have to try this.)

   The system call overhead in BSD is typically larger.  This is because
   of address range checking for copyin/copyout operations.  Linux has
   split this up into a seperate check call and copy operations, which is
   more prone to programmer error leaving security holes than an integral
   copy/check, but they have an advantage when it comes to multiple use
   memoy regions because of this (areas that are copied from several times
   or which are copied both in and out).

And it is precisely this that will screw Linux for SMP systems.  What
if someone else changes a page protection after you've already checked
it for `security'?  Answer: You don't need a programmer error to
create a security hole.

The region of memory needs to be locked between the time the security
check is done and the last use of it.  There are a few ways to do
this:

1) Only run on single-processor machines.

2) Only allow one processor to be executing in the kernel at a time.

3) Lock all memory mapped by a process when it enters the kernel, and
unlock it when exiting the kernel (being careful about the case of the
process dying, of course).

4) Lock memory when it is tested by verify_area(), and unlock it when
exiting the kernel.  (This is the `best option' in Linux, without
changing the interfaces.)

5) Lock memory only for the duration of a copy operation.  (This is
what the BSD interfaces suggest doing.)

Given that the BSD kernel is carefully tuned to limit the number of
situations in which repeated accesses are done to the same user memory
region, and that such tuning is fairly easy to do, the last option is
definitely the most attractive.

   Part of the checking is to allow address faulting instead of precheck
   comparison -- in other words, if the processer honors write protect in
   protected mode, this becomes a NULL op.  The magic here is that you
   then only actually perform the check on i386 processers and not on
   i486/i586.  The memory mapping is adjusted so this works.

I can't speak for FreeBSD, but NetBSD has done that for nearly two
years.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199504250710.DAA08689>