Date: Wed, 11 Sep 1996 05:53:49 +1000 From: Bruce Evans <bde@zeta.org.au> To: fwmiller@cs.UMD.EDU, terry@lambert.org Cc: freebsd-hackers@FreeBSD.org Subject: Re: kernel performance Message-ID: <199609101953.FAA01711@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>There are performance figure you can get via gprof. These are ^^^^^ `config -p' and kgmon. kgmon is just the messenger and gprof is just the interpreter. >statistical in nature, so it will be impossible to make a reasonable >distinction between cache/non-cache cases (which is why statistical >profiling sucks). Actually, FreeBSD-current has had non-statistical profiling using `config -pp', kgmon and gprof for 5 months. E.g., the following script: --- sync kgmon -rB dd if=/dev/zero of=/dev/null bs=1 count=100000 kgmon -hp gprof4 -u /kernel >zp --- gives accurate timing for the parts of the system exercised by per-char i/o to /dev/zero and /dev/null. The gprof output is standard, except the times are more accurate, e.g.: --- granularity: each sample hit covers 4 byte(s) for 0.00% of 8.75 seconds % cumulative self self total time seconds seconds calls ns/call ns/call name 33.6 2.937 2.937 _mcount (1987) 14.8 4.235 1.298 _mexitcount [4] 12.7 5.342 1.107 _cputime [5] 7.7 6.013 0.671 _user [9] 4.2 6.378 0.365 200121 1822 12468 _syscall [3] 2.6 6.604 0.227 200000 1134 2339 _mmrw [14] 2.4 6.815 0.210 400106 525 525 _ufs_lock [17] 2.1 7.002 0.187 400136 468 468 _ufs_unlock [18] 2.0 7.178 0.176 100007 1762 9471 _read [7] 2.0 7.353 0.175 100003 1750 8434 _vn_write [8] 2.0 7.527 0.174 100003 1738 5077 _spec_write [12] 2.0 7.698 0.171 100003 1712 10146 _write [6] 1.9 7.868 0.170 100007 1702 7709 _vn_read [10] 1.8 8.027 0.159 100046 1591 1591 _copyout [19] 1.8 8.184 0.157 200137 784 784 _copyin [20] 1.5 8.314 0.130 100000 1302 4635 _spec_read [15] 1.4 8.438 0.123 202201 609 663 _doreti [21] 0.9 8.520 0.082 100010 821 2411 _uiomove [16] 0.9 8.594 0.074 200121 372 12840 _Xsyscall [2] 0.7 8.655 0.061 100003 613 5690 _ufsspec_write [11] 0.4 8.693 0.038 100000 377 5011 _ufsspec_read [13] --- The above uses the Pentium timestamp count and was run on a P133. The version in -current is not quite as accurate. It doesn't compensate for the profiling overheads properly, especially at leaf nodes like ufs_unlock(). My current version is accurate to within a one or two cpu cycles (after compensating for the entire profiling overhead) in simple cases when there are no cache misses. >I have non-statistical profiling data starting from the VFS consumer >layer, and working its way down through the supporting code, but >excluding some VM and driver effects... it was collected on Win95 >using the Pentium instruction clock using highly modified gprof code >and compiler generated function entry points + stack hacking to get >function exit counters. The Win95 code had all of the gross >architectural modifications I've been discussing for the past two >years, so there are some functional bottlenecks removed. The data >is proprietary to my employer. In FreeBSD-current, gprof is essentially unchanged; function entry points are handled normally (compiling with cc -pg generates calls to mcount) and extra code is generated by compiling with cc -mprofiler-epilogue. The stack isn't modified (modifying it might be faster but I thought it would be too hard to implement). Interrupt entry and exit points and cross-jumping between functions is handled specially. Neither the code nor the data is propietary :-). Future versions will support non-statistical profiling using any available counter. My version currently supports the following counters: - standard i8254 counter. The default in -current on non-Pentiums. Its overhead is very high. - Pentium performance-monitoring counters. The code for this was mostly written by Garrett Wollman. I have used it mainly to debug the high variance in the overhead of the profiling routines. It turned out that two often used globals and a stack variable sometimes collided in the Pentium cache, causing several cache misses in the profiling routines. The cache misses doubled the overheads. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609101953.FAA01711>