FreeBSD Mail Archives

Date:      Wed, 23 Nov 2011 10:37:14 +0000
From:      David Chisnall <theraven@FreeBSD.org>
To:        David Schultz <das@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, Eitan Adler <eadler@FreeBSD.org>, svn-src-all@FreeBSD.org, dim@FreeBSD.org, Brooks Davis <brooks@FreeBSD.org>, bde@FreeBSD.org, svn-src-head@FreeBSD.org
Subject:   Re: svn commit: r227812 - head/lib/libc/string
Message-ID:  <0DC88C34-91B4-49D1-AA8A-73B14C99D35B@FreeBSD.org>
In-Reply-To: <20111122202735.GA21442@zim.MIT.EDU>
References:  <201111220250.pAM2oPWC070856@svn.freebsd.org> <20111122153332.GA20145@zim.MIT.EDU> <CAF6rxgmPeZCZ3c0xbd-4riqvLHob8U9eWG25R8P6FG2BjTfyyA@mail.gmail.com> <20111122202735.GA21442@zim.MIT.EDU>

index | next in thread | previous in thread | raw e-mail

On 22 Nov 2011, at 20:27, David Schultz wrote:

> Benchmark or not, I think you'll have a very hard time finding a
> single real program that routinely calls strcasecmp() with
> identical pointers!

I've seen this pattern very often.  Often the linker is able to combine constant strings defined in different compilation units.  With link-time optimisation, there are also more opportunities for the compiler to do this.  

A fairly common pattern is to define constant strings as macros in a header and then use them as keys in a dictionary, first hashed and then compared with strcmp().  In this case, the == check is a significant win.  I've had to work around the fact that FreeBSD's libc is significantly slower than GNU libc in this instance by adding an extra == outside of strcmp() - this increases the size of the code everywhere this pattern is used, increasing cache usage, and lowering overall performance (and good luck coming up with a microbenchmark that demonstrates that - although I'd be happy to provide you with a Google-authord paper from a couple of years ago explaining why it's so hard to benchmark accurately on modern machines...).

It's also worth noting that the cost of the extra branch is more or less trivial, as every single character in the input strings will also need to be compared.  This change turns a linear complexity case into a constant complexity case, so it's a clear algorithmic improvement for a case that, while rare, is not as improbable as you seem to suppose.

As to the | vs || issue - by all means change it to || if it fits better with the FreeBSD style.  In the general case I prefer to use | to hint to the compiler and readers of the code that short-circuit evaluation is not required and to remove a sequence point and make life easier for the optimiser.  In this case, the two are equivalent so it's just a hint to the reader, and apparently (judging by the responses so far) one that is not well understood.

David

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0DC88C34-91B4-49D1-AA8A-73B14C99D35B>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation