From owner-freebsd-hackers Wed Apr 26 21:42:52 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id VAA16003 for hackers-outgoing; Wed, 26 Apr 1995 21:42:52 -0700 Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id VAA15996 for ; Wed, 26 Apr 1995 21:42:51 -0700 Received: by cs.weber.edu (4.1/SMI-4.1.1) id AA11702; Wed, 26 Apr 95 22:35:59 MDT From: terry@cs.weber.edu (Terry Lambert) Message-Id: <9504270435.AA11702@cs.weber.edu> Subject: Re: benchmark hell.. To: bde@zeta.org.au (Bruce Evans) Date: Wed, 26 Apr 95 22:35:58 MDT Cc: bde@zeta.org.au, geli.com!rcarter@implode.root.com, hackers@FreeBSD.org, jkh@violet.berkeley.edu, toor@jsdinc.root.com In-Reply-To: <199504270414.OAA10221@godzilla.zeta.org.au> from "Bruce Evans" at Apr 27, 95 02:14:46 pm X-Mailer: ELM [version 2.4dev PL52] Sender: hackers-owner@FreeBSD.org Precedence: bulk > FreeBSD doesn't compete now. It takes 10uS for getpid() and 110uS for > a successful stat("z", &sb) in a loop. The kernel parts of the time > are approximately: What are the same numbers for an fstat? One would expect it to drop out only the lookup itself. I think the malloc and free are suspicious, and should probably be stack allocation instead. That's 14uS (or 12%) right there. The divides are *extremely* curious. They could be alignement in malloc, though I would expect an AND to be used instead of a div. That's 12% right there. I think maybe half of syscall could go away with a revamp of the trampoline code. The nami is painful, as is the cache lookup; one would epect ufs_lookup to be much smaller in the case of a cache hit, and in a loop, one would expect a cache hit. I'm going to call that 5uS, and that's without cache optimization (someone *really* needs to look at either the SVR4 DNLC from "The Magic Garden Explained" or the Linux two level cache code (look at the ext2FS usage, not the umsdos or cdfs usage). I understand the copyout (but it's a bit large), but I don't undrestand the copyin seperate from the copyinstr. I think the ufs_getattr comes from the buffer fudging that is used for NFS export but serves no real useful purpose here; I rememebr complaining about the semantic change at the time it was made for just this reason. I don't understand the double lock, unless it was for the directory lookup then the stat of the object itself. If this is the case (I'll have to get more interested to look than I am right now 8-)), then it seems that the stat/fstat division has been made in the wrong place and that the lock should be held at a different level. This would let fstat lock the open vnode while stat locks only the directory vnode around the file stat, since that would guarantee against reentrancy. That should be another 6 (~5%). Call it 34% without the buffer fudging, or ~37uS. And that's just block optimization. 8-). > There's lots of bloat to trim. I would start with ufs_lock() and > ufs_unlock() because they are significant in tty i/o, then look at > the quad division functions. Yeah... although I wouldn't expect a big impact on ttyio except from the lookup unles you are talking specfs. I'd also like to see what percentage of the time is in I/O wait vs. actually doing work; it could be a bum hold that's doing it in. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.