From owner-svn-src-head@FreeBSD.ORG Wed Nov 23 10:37:20 2011 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A656E1065672; Wed, 23 Nov 2011 10:37:20 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from theravensnest.org (theravensnest.org [109.169.23.128]) by mx1.freebsd.org (Postfix) with ESMTP id 286D88FC16; Wed, 23 Nov 2011 10:37:19 +0000 (UTC) Received: from [192.168.0.2] (cpc2-cwma5-0-0-cust875.7-3.cable.virginmedia.com [86.11.39.108]) (authenticated bits=0) by theravensnest.org (8.14.4/8.14.4) with ESMTP id pANAbI8q034741 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Wed, 23 Nov 2011 10:37:18 GMT (envelope-from theraven@FreeBSD.org) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: David Chisnall In-Reply-To: <20111122202735.GA21442@zim.MIT.EDU> Date: Wed, 23 Nov 2011 10:37:14 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <0DC88C34-91B4-49D1-AA8A-73B14C99D35B@FreeBSD.org> References: <201111220250.pAM2oPWC070856@svn.freebsd.org> <20111122153332.GA20145@zim.MIT.EDU> <20111122202735.GA21442@zim.MIT.EDU> To: David Schultz X-Mailer: Apple Mail (2.1251.1) Cc: src-committers@FreeBSD.org, Eitan Adler , svn-src-all@FreeBSD.org, dim@FreeBSD.org, Brooks Davis , bde@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r227812 - head/lib/libc/string X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Nov 2011 10:37:20 -0000 On 22 Nov 2011, at 20:27, David Schultz wrote: > Benchmark or not, I think you'll have a very hard time finding a > single real program that routinely calls strcasecmp() with > identical pointers! I've seen this pattern very often. Often the linker is able to combine = constant strings defined in different compilation units. With link-time = optimisation, there are also more opportunities for the compiler to do = this. =20 A fairly common pattern is to define constant strings as macros in a = header and then use them as keys in a dictionary, first hashed and then = compared with strcmp(). In this case, the =3D=3D check is a significant = win. I've had to work around the fact that FreeBSD's libc is = significantly slower than GNU libc in this instance by adding an extra = =3D=3D outside of strcmp() - this increases the size of the code = everywhere this pattern is used, increasing cache usage, and lowering = overall performance (and good luck coming up with a microbenchmark that = demonstrates that - although I'd be happy to provide you with a = Google-authord paper from a couple of years ago explaining why it's so = hard to benchmark accurately on modern machines...). It's also worth noting that the cost of the extra branch is more or less = trivial, as every single character in the input strings will also need = to be compared. This change turns a linear complexity case into a = constant complexity case, so it's a clear algorithmic improvement for a = case that, while rare, is not as improbable as you seem to suppose. As to the | vs || issue - by all means change it to || if it fits better = with the FreeBSD style. In the general case I prefer to use | to hint = to the compiler and readers of the code that short-circuit evaluation is = not required and to remove a sequence point and make life easier for the = optimiser. In this case, the two are equivalent so it's just a hint to = the reader, and apparently (judging by the responses so far) one that is = not well understood. David=