Date: Wed, 3 Aug 2005 01:35:52 +0800 From: Xin LI <delphij@frontfree.net> To: freebsd-arch@FreeBSD.org, freebsd-amd64@FreeBSD.org Subject: Re: [RFC] Port of NetBSD's optimized amd64 string code Message-ID: <20050802173552.GB17471@frontfree.net> In-Reply-To: <20050802172042.GA71672@dragon.NUXI.org> References: <20050801182518.GA85423@frontfree.net> <20050802013916.GA37135@dragon.NUXI.org> <20050802040246.GB3799@frontfree.net> <20050802172042.GA71672@dragon.NUXI.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--/WwmFnJnmDyWGHa4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 02, 2005 at 10:20:42AM -0700, David O'Brien wrote: > On Tue, Aug 02, 2005 at 12:02:46PM +0800, Xin LI wrote: > > On Mon, Aug 01, 2005 at 06:39:16PM -0700, David O'Brien wrote: > > > On Tue, Aug 02, 2005 at 02:25:18AM +0800, Xin LI wrote: > > > > Here is a patchset that I have produced to make our libc aware of t= he > > > > NetBSD assembly implementation of the string related operations. > > >=20 > > > What performance benchmarks have these been thru? > .. > > BTW. Would you please give me some hints on the benchmarking? I am > > not sure whether just looping the test cases on some determine dataset > > would be enough? >=20 > Try some real world tests such as 'make buildworld'. Looking in > src/usr.bin the following utils make good use of these libc functions and > would be good real world tests: uuencode catman compress last makewhatis >=20 > * uuencode a large kernel > * run /etc/periodic/weekly/320.whatis > * compress a large kernel > * last delphij on a large /var/log/wtmp > * cp /usr/src/share/man/man[1-9] to a ram disk and then run catman over it Thanks, I will try these tomorrow. > Just a few suggestions. It is easy to "optimize" for the simple input ca= se > and miss the larger case. I've also seen people "optimize" for all cases > but then wind up with so much overhead that small inputs are slower. >=20 > I have some very fancy routines from AMD that take into account cache > size, alignment, and uses the prefetch instructions. The problem is they > are a huge win for large input sizes, but I'm concerned about their > performance on small input sizes. >=20 > If these NetBSD routines perform better in the tests I listed above, we > should commit them. We can continue to refine these libc routines over > time. Agreed. I will do more careful benchmarks that can reflect more real world= =20 better, to figure out whether these "optimizations" are really necessary for us. Cheers, --=20 Xin LI <delphij frontfree net> http://www.delphij.net/ See complete headers for GPG key and other information. --/WwmFnJnmDyWGHa4 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFC7674/cVsHxFZiIoRAu/4AJ9w62vonIN+p9sfcdZZNJcuOkSsHgCcDpci 5psIn9+yVcxR0DVnB248410= =beKZ -----END PGP SIGNATURE----- --/WwmFnJnmDyWGHa4--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050802173552.GB17471>