Date: Mon, 1 Jul 2002 01:50:09 -0700 From: Luigi Rizzo <luigi@FreeBSD.ORG> To: Bruce Evans <bde@zeta.org.au> Cc: Andrew Gallatin <gallatin@cs.duke.edu>, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/sys/i386/isa clock.c src/sys/i386/include param.h src/sys/conf options.i386 Message-ID: <20020701015009.A91151@iguana.icir.org> In-Reply-To: <20020701175713.H7244-100000@gamplex.bde.org>; from bde@zeta.org.au on Mon, Jul 01, 2002 at 06:34:08PM %2B1000 References: <20020630234058.A90041@iguana.icir.org> <20020701175713.H7244-100000@gamplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
The kind of things I have used this macro for were: * how long did tbis bus I/O instruction take ? (the instruction itself it is 4 cycles on the PCI bus, but due to contention it can be delayed by tens of microseconds, literally. This came out quite surprisingly when i was doing the work on polling last year on a box where the PCI bus was saturated) * is it really cheaper to inline some large function than have it as a function call ? (there is a lot of interference with cache and instruction scheduling) * how much time is spent in this (short) function, and could we optimize it ? (e.g. the MGETHDR/MGETCL thing recently). * general profiling of network stack and network device drivers. All of this stuff needs to handle 100k..500k events per second, so you can only get useful measurements if the granularity of the timer you are using is in the order of 20-50ns, and if taking the measurement itself does not consume much time (which is usually the case if you try to abstract from the hardware-specific things, and you still end up hitting the TSC, but after one or more indirect calls and maybe some locking and frequency conversion to report numbers in a standard unit). Even worse if you want to do do event filtering -- that is, it is wise to put the calls to the timestamping code in appropriate places in the code (perhaps in the same conditional blocks where the event occurs), but i believe it would be a design mistake to try and define a mechanism to control at runtime whether or not samples already taken should be recorded. And if you really have that need you can do your own: if (user_specified_sysctl_said_to_take_this) TSTMP(1,2,3,4); Yes, it is very hard to do it properly in an SMP environment, and yes there are performance issues. This is why I argue for simplicity. cheers luigi On Mon, Jul 01, 2002 at 06:34:08PM +1000, Bruce Evans wrote: ... > I find it more convenient to put some simple processing (e.g., filtering > and sysctl output formatting) in the kernel. Filtering may stop trace > buffers from overflowing very fast, and the output formatting overhead > doesn't usually matter since you rarely actually use the sysctl. Both > of these require understanding the data. Filtering can be done in a > subroutine, but once you have that you can easily add a few lines to > it to make it store in a non-generic buffer. > > I guess the main thing that I object to is using an i386-specific > counter. Only very time-critical code can't use nanotime(). nanotime() > is more than good enough for disk drivers and serial drivers. It is > adequate for mutex profiling modulo some locking problems, so it should > be adequate for network drivers. KTR uses get_cyclecount() due to the > locking problems, but this doesn't always help since get_cyclecount() is > essentially nanotime() on some machines. Kernel profiling on i386's uses > cputime(). cputime() has much the same locking problems as KTR. > > Bruce > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe cvs-all" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020701015009.A91151>