Date: Wed, 15 Oct 2008 19:41:39 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Scott Long <scottl@samsco.org> Cc: svn-src-head@FreeBSD.org, Marius Strobl <marius@FreeBSD.org>, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org Subject: Re: svn commit: r183896 - head/sys/dev/bge Message-ID: <20081015184833.N43215@delplex.bde.org> In-Reply-To: <48F5053D.7070705@samsco.org> References: <200810142028.m9EKShoL015514@svn.freebsd.org> <48F5053D.7070705@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Oct 2008, Scott Long wrote: > Marius Strobl wrote: >> Author: marius >> Date: Tue Oct 14 20:28:42 2008 >> New Revision: 183896 >> URL: http://svn.freebsd.org/changeset/base/183896 >> >> Log: >> Use bus_{read,write}_4(9) instead of bus_space_{read,write}_4(9) >> in order to get rid of the bus space handle and tag in the softc. > > Has anyone looked at the generated code from this interface switch and > compared it what was getting generated previously? Way back when, I just looked at the source code. This seems to be only a small pessimization since it involves just one extra indirection and related loss of optimization possibilities: (sc->sc_bt, sc->sc_bh) becomes r = sc->sc_br; (r->r_bustag, r->r_bushandle). In theory, the compiler could optimize by caching the tag and handle in registers in either case so that only 1 extra indirection is needed per function, but I've never seen that being done enough to make much difference. However, some drivers (e.g., ata) were too stupid to cache sc_bt and sc_bh, and converting these to use bus_nonspace*() was a relatively large optimization: r = sc->sc_br; (rman_get_bustag(r), rman_get_bushandle(r)) became r = sc->sc_br; (r->r_bustag, r->r_bushandle). Since rman_get_bustag() and rman_get_bushandle() have never been inline, calling both of them on every bus space access gave enormous code space and instruction count bloat and corresponding (relatively tiny) runtime bloat. The instructions normally run so much faster than the i/o that you can do hundreds or thousands of them per i/o before noticing the runtime bloat, and the instruction count bloat here is only about a factor of 20. > including <machine/bus_memio.h> made bus_space_read|write_4() compile > into a direct memory access on machines that supported it. The dubious > removal of bus_memio.h and bus_pio.h took away that benefit, and I'm > afraid that it's only getting worse now. Bus writes to card memory are > still very important to high-performance devices and shouldn't be > pessimized in the name of simpler-looking C code. I hate the bloat too, but it rarely matters (see above). I mainly noticed it using ddb. Where nice drivers are optimized to use a single instruction per i/o (or perhaps 2 with minimal bus space bloat), the ones that use rman_get*() on every i/o take 20+, and with a primitive debugger like ddb it is painful to even skip over all the function calls. Which devices have fast enough i/o for the extra indirection to matter? bge tries hard to avoid all PCI access in time-critical code. I think we reduced the PCI accesses to 1 PCI write per interrupt. Most device accesses for bge use host memory which is DMAed to/from by the hardware, so bus space doesn't apply. This type of access seems to be the best or only way for a PCI device to go fast enough (though it is still too slow on at least i386 since the DMA always causes cache misses). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081015184833.N43215>