From owner-freebsd-arch@FreeBSD.ORG Mon Jan 19 20:36:09 2009 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B83A41065673 for ; Mon, 19 Jan 2009 20:36:09 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 7A11F8FC19 for ; Mon, 19 Jan 2009 20:36:09 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.3/8.14.2) with ESMTP id n0JK44RJ027313; Mon, 19 Jan 2009 15:04:04 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.3/8.14.2/Submit) id n0JK42cW027312; Mon, 19 Jan 2009 15:04:02 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Mon, 19 Jan 2009 15:04:02 -0500 From: David Schultz To: d@delphij.net Message-ID: <20090119200402.GA26878@zim.MIT.EDU> Mail-Followup-To: d@delphij.net, freebsd-arch@FreeBSD.ORG References: <4966B5D4.7040709@delphij.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4966B5D4.7040709@delphij.net> Cc: freebsd-arch@FreeBSD.ORG Subject: Re: RFC: MI strlen() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jan 2009 20:36:10 -0000 On Thu, Jan 08, 2009, Xin LI wrote: > Here is a new implementation of strlen() which employed the bitmask > skill in order to achieve better performance on modern hardware. For > common case, this would be a 5.2x boost on FreeBSD/amd64. The code is > intended for MI use when there is no hand-optimized assembly. I ran some microbenchmarks on amd64, which show that the version of strlen() in libc is up to twice as fast as yours for short strings (< 4 bytes), but your implementation is nearly 5 times as fast for longer strings. As Bruce pointed out, gcc will almost use its builtin strlen(). However, that may change in the future, and nobody has suggested that your version would actually hurt anything, so I think you should commit it. Benchmark results: http://www.freebsd.org/~das/strlen.gif I ran this on a Wolfdale core using word-aligned ASCII strings and an adaptive number of iterations. As you can see, the gcc builtin is always slower than your code, but faster than our current libc implementation. I can't explain why the builtin is faster for strings of length 10 than it is for strings of length 1, but the results are repeatable. Another interesting thing to note is that your implementation is the only one that gets less throughput when the string no longer fits in the L2 cache. This suggests that either the other two are so slow that they can't use the full memory bandwidth, or they are more effective at triggering the CPU's prefetch heuristics.