Date: Wed, 20 Apr 2005 03:37:58 +0000 (GMT) From: jkoshy@FreeBSD.ORG (Joseph Koshy) To: obrien@FreeBSD.org Cc: Joseph Koshy <jkoshy@FreeBSD.org> Subject: Re: cvs commit: src/gnu/usr.bin/groff/tmac mdoc.local src/lib Makefile src/share/doc/papers/hwpmc Makefile hwpmc.ms src/share/examples/hwpmc README src/share/man/man4 Makefile ... Message-ID: <20050420033758.8711B16A4CF@hub.freebsd.org> In-Reply-To: Message from "David O'Brien" <obrien@FreeBSD.org> <20050419181128.GA27443@dragon.NUXI.org>
next in thread | previous in thread | raw e-mail | index | archive | help
al> I assume this is like a portable version of the measurement backend in al> Intels VTune... at least I assume VTune does something like this al> itself. I have not actually used Intel's VTune or AMD's CodeAnalyst so please take my words with a pinch of salt. >From reading the publically available documentation, VTune's backend appears to do 'system-wide sampling'. Our backend can do system-wide measurements as well as per-process measurements (i.e., the counter hardware can be 'virtualized'). Another difference is that we support 'counting' as well as 'sampling'. So 4 kinds of PMC usage styles are currently supported by our infrastructure: - process-private, counting o We could have a profiling runtime library that augments its data collection with data from the PMCs at function entry/exit. o Scientific applications could use this mode to measure hardware counts between two points of code. I believe the scientific community uses an API named "PAPI" for performance measurements. We should be able to support PAPI in -current now. - system-wide, counting o You could allocate system-wide, counting PMCs and read these once a minute. This operation would have near-zero overhead and could be used for collecting long-term data, say for making machine sizing decisions. - process-private, sampling o The standard 'profiling' function, with a couple of twists: you would not need to specially compile executables for profiling, and you could profile any process you could PMC_ATTACH a PMC to. - system-wide, sampling o This 'profiles' the whole system: applications, kernel and interrupt handlers. The current snapshot in -current has sampling modes turned off as they haven't been fully implemented. obrien> Every modern CPU has event counters. Some CPU's have as little as 2 obrien> (Pentium Pro), others have 4 (Athlon64 and Opteron), I think IA-64 has The P4 has had the most complexity so far: 18 counters, 45 event-select registers and many many restrictions about what works with what. Further, logical (HTT) cpus share PMC resources and some events change semantics if HTT is enabled (TS/TI events) :(. The userland library pmc(3) and the driver hwpmc(4) handle these issues for you. obrien> This PMC facility is much more similar to Linux's Oprofile than VTune or obrien> AMD's CodeAnalyst. It allows one to set and access the event counters. Linux has Oprofile (for system-wide sampling) and many separate 'counting' mode implementations (Perfctr, Rabbit, Lperfex, etc.). obrien> You will need to find the applicable CPU docs so you know what [public] obrien> events exist, and any "options" those events have. The PMC specific sections of pmc(3) list the events and allowed modifiers that our library understands. You would still need to read the CPU docs: some of the events measured by hardware only make sense in the context of a given CPU architecture. For folks who like Python, there is a Python wrapper around libpmc that makes it easy to play around with this functionality. You can pick it up at: http://people.freebsd.org/~jkoshy/projects/perf-measurement/pypmc.html Regards, Koshy <jkoshy@freebsd.org>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050420033758.8711B16A4CF>