From owner-freebsd-hackers  Wed Apr 26 21:42:52 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id VAA16003
          for hackers-outgoing; Wed, 26 Apr 1995 21:42:52 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id VAA15996
          for <hackers@FreeBSD.org>; Wed, 26 Apr 1995 21:42:51 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA11702; Wed, 26 Apr 95 22:35:59 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9504270435.AA11702@cs.weber.edu>
Subject: Re: benchmark hell..
To: bde@zeta.org.au (Bruce Evans)
Date: Wed, 26 Apr 95 22:35:58 MDT
Cc: bde@zeta.org.au, geli.com!rcarter@implode.root.com, hackers@FreeBSD.org,
        jkh@violet.berkeley.edu, toor@jsdinc.root.com
In-Reply-To: <199504270414.OAA10221@godzilla.zeta.org.au> from "Bruce Evans" at Apr 27, 95 02:14:46 pm
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

> FreeBSD doesn't compete now.  It takes 10uS for getpid() and 110uS for
> a successful stat("z", &sb) in a loop.  The kernel parts of the time
> are approximately:

What are the same numbers for an fstat?  One would expect it to drop out
only the lookup itself.

I think the malloc and free are suspicious, and should probably be
stack allocation instead.  That's 14uS (or 12%) right there.

The divides are *extremely* curious.  They could be alignement in malloc,
though I would expect an AND to be used instead of a div.  That's 12%
right there.

I think maybe half of syscall could go away with a revamp of the
trampoline code.  The nami is painful, as is the cache lookup;
one would epect ufs_lookup to be much smaller in the case of a
cache hit, and in a loop, one would expect a cache hit.  I'm going
to call that 5uS, and that's without cache optimization (someone
*really* needs to look at either the SVR4 DNLC from "The Magic
Garden Explained" or the Linux two level cache code (look at the
ext2FS usage, not the umsdos or cdfs usage).

I understand the copyout (but it's a bit large), but I don't
undrestand the copyin seperate from the copyinstr.

I think the ufs_getattr comes from the buffer fudging that is used
for NFS export but serves no real useful purpose here; I rememebr
complaining about the semantic change at the time it was made for
just this reason.

I don't understand the double lock, unless it was for the directory
lookup then the stat of the object itself.  If this is the case (I'll
have to get more interested to look than I am right now 8-)), then
it seems that the stat/fstat division has been made in the wrong place
and that the lock should be held at a different level.  This would let
fstat lock the open vnode while stat locks only the directory vnode
around the file stat, since that would guarantee against reentrancy.
That should be another 6 (~5%).

Call it 34% without the buffer fudging, or ~37uS.  And that's just
block optimization.  8-).


> There's lots of bloat to trim.  I would start with ufs_lock() and
> ufs_unlock() because they are significant in tty i/o, then look at
> the quad division functions.

Yeah... although I wouldn't expect a big impact on ttyio except from
the lookup unles you are talking specfs.

I'd also like to see what percentage of the time is in I/O wait vs.
actually doing work; it could be a bum hold that's doing it in.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.