Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Apr 1995 14:14:46 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        bde@zeta.org.au, terry@cs.weber.edu
Cc:        geli.com!rcarter@implode.root.com, hackers@FreeBSD.org, jkh@violet.berkeley.edu, toor@jsdinc.root.com
Subject:   Re: benchmark hell..
Message-ID:  <199504270414.OAA10221@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help

>> overhead is about 10 usec on a 486/33.  If each of the 20 processes runs
>> for 100 msec, then the FPU context switch overhead is 10 usec every 2
>> seconds, i.e., a whole 0.0005%.  Multiply by 10*10 for a more reasonable
>> number of runnable processes and a more usual(?) timeslice.  Then the
>> overhead is a whole 0.05%.  To demonstrate the speed advantages you
>> need a special benchmark that forces a context switch after every
>> few instructions instead of after every 10 msec.

>This assumes full quantum use before switch; this is actually quite
>atypical, and a context switch is much more likely the result of a
>voluntary switch coming from an attempt at a blocking operation.  8-).

I divided the full quantum by 10 in order to overestimate the speed
advantage.  FPU-using programs are more likely than most to use their
full quantum, so the overestimate is probably large.

>> >The microtime requirement is a result of the timer interval being
>> >equal to the lbolt interval for mandatory context switch.  I've argued
>> >this before.
>> 
>> I've refuted this before :-).

>Yet that microtime() is still there.  8-(.

It is necessary for accurate timing of processes that switch context
voluntarily because the switch may occur at any time.  For forced
context switches (which as you say above are fairly rare, so not worth
optimizing for :-), it may be possible to use the known time of the
context switch as a timestamp.  This would be less accurate because
of interrupt latency.

>> I looked again.  On a 486DX2/6, copyinstr() takes 4 usec for strings
>> of length 1 and 66 usec for strings of length 255.  rename("a", "b")
>> ...

>The total duration of a file system related system call that isn't a
>read or write on UnixWare is 20uS.  FreeBSD ought to be able to compete.

>At 4uS, this is 20% of the overhead.

FreeBSD doesn't compete now.  It takes 10uS for getpid() and 110uS for
a successful stat("z", &sb) in a loop.  The kernel parts of the time
are approximately:

		stat:			getpid:
_Xsyscall	3			1uS
_syscall	8			3uS
_stat		2	_getpid		2uS
_doreti		3			1uS
___qdivrem	11
_lookup		10
_malloc		8
_syscall	8
_ufs_getattr	8
_copyout	8
_ufs_access	8
_ufs_lock	3 (called twice per stat())
_namei		6
_ufs_lookup	6
_cache_lookup	6
_ufs_unlock	3 (called twice per stat())
_free		6
_vn_stat	5
_copyinstr	4
_copyin		4
___udivdi3	3
_vrele		2 (called twice per stat())
_bcmp		3
_vget		3
_vput		1

There's lots of bloat to trim.  I would start with ufs_lock() and
ufs_unlock() because they are significant in tty i/o, then look at
the quad division functions.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199504270414.OAA10221>