Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Aug 2005 01:35:52 +0800
From:      Xin LI <delphij@frontfree.net>
To:        freebsd-arch@FreeBSD.org, freebsd-amd64@FreeBSD.org
Subject:   Re: [RFC] Port of NetBSD's optimized amd64 string code
Message-ID:  <20050802173552.GB17471@frontfree.net>
In-Reply-To: <20050802172042.GA71672@dragon.NUXI.org>
References:  <20050801182518.GA85423@frontfree.net> <20050802013916.GA37135@dragon.NUXI.org> <20050802040246.GB3799@frontfree.net> <20050802172042.GA71672@dragon.NUXI.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--/WwmFnJnmDyWGHa4
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 02, 2005 at 10:20:42AM -0700, David O'Brien wrote:
> On Tue, Aug 02, 2005 at 12:02:46PM +0800, Xin LI wrote:
> > On Mon, Aug 01, 2005 at 06:39:16PM -0700, David O'Brien wrote:
> > > On Tue, Aug 02, 2005 at 02:25:18AM +0800, Xin LI wrote:
> > > > Here is a patchset that I have produced to make our libc aware of t=
he
> > > > NetBSD assembly implementation of the string related operations.
> > >=20
> > > What performance benchmarks have these been thru?
> ..
> > BTW.  Would you please give me some hints on the benchmarking?  I am
> > not sure whether just looping the test cases on some determine dataset
> > would be enough?
>=20
> Try some real world tests such as 'make buildworld'.  Looking in
> src/usr.bin the following utils make good use of these libc functions and
> would be good real world tests: uuencode catman compress last makewhatis
>=20
> * uuencode a large kernel
> * run /etc/periodic/weekly/320.whatis
> * compress a large kernel
> * last delphij on a large /var/log/wtmp
> * cp /usr/src/share/man/man[1-9] to a ram disk and then run catman over it

Thanks, I will try these tomorrow.

> Just a few suggestions.  It is easy to "optimize" for the simple input ca=
se
> and miss the larger case.  I've also seen people "optimize" for all cases
> but then wind up with so much overhead that small inputs are slower.
>=20
> I have some very fancy routines from AMD that take into account cache
> size, alignment, and uses the prefetch instructions.  The problem is they
> are a huge win for large input sizes, but I'm concerned about their
> performance on small input sizes.
>=20
> If these NetBSD routines perform better in the tests I listed above, we
> should commit them.  We can continue to refine these libc routines over
> time.

Agreed.  I will do more careful benchmarks that can reflect more real world=
=20
better, to figure out whether these "optimizations" are really necessary for
us.

Cheers,
--=20
Xin LI <delphij frontfree net>	http://www.delphij.net/
See complete headers for GPG key and other information.


--/WwmFnJnmDyWGHa4
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (FreeBSD)

iD8DBQFC7674/cVsHxFZiIoRAu/4AJ9w62vonIN+p9sfcdZZNJcuOkSsHgCcDpci
5psIn9+yVcxR0DVnB248410=
=beKZ
-----END PGP SIGNATURE-----

--/WwmFnJnmDyWGHa4--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050802173552.GB17471>