Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Nov 2011 10:37:14 +0000
From:      David Chisnall <theraven@FreeBSD.org>
To:        David Schultz <das@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, Eitan Adler <eadler@FreeBSD.org>, svn-src-all@FreeBSD.org, dim@FreeBSD.org, Brooks Davis <brooks@FreeBSD.org>, bde@FreeBSD.org, svn-src-head@FreeBSD.org
Subject:   Re: svn commit: r227812 - head/lib/libc/string
Message-ID:  <0DC88C34-91B4-49D1-AA8A-73B14C99D35B@FreeBSD.org>
In-Reply-To: <20111122202735.GA21442@zim.MIT.EDU>
References:  <201111220250.pAM2oPWC070856@svn.freebsd.org> <20111122153332.GA20145@zim.MIT.EDU> <CAF6rxgmPeZCZ3c0xbd-4riqvLHob8U9eWG25R8P6FG2BjTfyyA@mail.gmail.com> <20111122202735.GA21442@zim.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help
On 22 Nov 2011, at 20:27, David Schultz wrote:

> Benchmark or not, I think you'll have a very hard time finding a
> single real program that routinely calls strcasecmp() with
> identical pointers!

I've seen this pattern very often.  Often the linker is able to combine =
constant strings defined in different compilation units.  With link-time =
optimisation, there are also more opportunities for the compiler to do =
this. =20

A fairly common pattern is to define constant strings as macros in a =
header and then use them as keys in a dictionary, first hashed and then =
compared with strcmp().  In this case, the =3D=3D check is a significant =
win.  I've had to work around the fact that FreeBSD's libc is =
significantly slower than GNU libc in this instance by adding an extra =
=3D=3D outside of strcmp() - this increases the size of the code =
everywhere this pattern is used, increasing cache usage, and lowering =
overall performance (and good luck coming up with a microbenchmark that =
demonstrates that - although I'd be happy to provide you with a =
Google-authord paper from a couple of years ago explaining why it's so =
hard to benchmark accurately on modern machines...).

It's also worth noting that the cost of the extra branch is more or less =
trivial, as every single character in the input strings will also need =
to be compared.  This change turns a linear complexity case into a =
constant complexity case, so it's a clear algorithmic improvement for a =
case that, while rare, is not as improbable as you seem to suppose.

As to the | vs || issue - by all means change it to || if it fits better =
with the FreeBSD style.  In the general case I prefer to use | to hint =
to the compiler and readers of the code that short-circuit evaluation is =
not required and to remove a sequence point and make life easier for the =
optimiser.  In this case, the two are equivalent so it's just a hint to =
the reader, and apparently (judging by the responses so far) one that is =
not well understood.

David=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0DC88C34-91B4-49D1-AA8A-73B14C99D35B>