From owner-freebsd-stable@FreeBSD.ORG Sun Mar 6 07:56:33 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1C681065672 for ; Sun, 6 Mar 2011 07:56:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id A74EA8FC15 for ; Sun, 6 Mar 2011 07:56:33 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta08.emeryville.ca.mail.comcast.net with comcast id Fjty1g0021u4NiLA8jwZa6; Sun, 06 Mar 2011 07:56:33 +0000 Received: from koitsu.dyndns.org ([98.248.33.18]) by omta21.emeryville.ca.mail.comcast.net with comcast id FjwX1g00i0PUQVN8hjwYqv; Sun, 06 Mar 2011 07:56:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 71D4E9B422; Sat, 5 Mar 2011 23:56:31 -0800 (PST) Date: Sat, 5 Mar 2011 23:56:31 -0800 From: Jeremy Chadwick To: Gary Palmer , freebsd-stable@freebsd.org Message-ID: <20110306075631.GA75125@icarus.home.lan> References: <20110305234514.GA34594@icarus.home.lan> <20110306024604.GA7746@in-addr.com> <20110306030720.GA99973@icarus.home.lan> <20110306070450.GA92752@lava.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110306070450.GA92752@lava.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Strange performance issue with grep -r -i as non-root user X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2011 07:56:33 -0000 On Sat, Mar 05, 2011 at 09:04:50PM -1000, Clifton Royston wrote: > On Sat, Mar 05, 2011 at 07:07:20PM -0800, Jeremy Chadwick wrote: > ... > > $ unset LANG > > - Result: still 80x slower with -i > > $ unset LANG LC_COLLATE > > - Result: still 80x slower with -i > > $ unset LANG LC_CTYPE > > - Result: normal/fast. > > $ unset LC_CTYPE > > - Result: still 80x slower with -i > > $ unset LC_CTYPE LC_COLLATE > > - Result: still 80x slower with -i > > $ unset LC_COLLATE > > - Result: still 80x slower with -i > > > > So the LANG + LC_CTYPE combo when used together are what cause this. > > Doesn't the above say that having either one set does it? You're correct -- I phrased this incorrectly, my apologies. > I would guess it's probably that either one requires the 8.x > grep -i to make a conversion function call for each char (or perhaps > line) of input to ensure the proper upper/lower case conversion rules > are followed. A colleague of mine (who I wish I would have asked first) knew of this quirk with grep (apparently some other utilities behave oddly as well with LANG/LC_CTYPE; he mentioned less as another example), stating that a locale can induce very long delays like this solely due to the amount of processing needed to scan through lists of certain characters which are not always linear in order (thus multiple scans are needed). With ASCII this appears to be significantly easier given that uppercase range from 0x41-0x5a and lowercase from 0x61-0x7a. There's significantly less "stuff" to do in this situation. His statement, despite vague/no technical reference details, does make sense to me. I should also state (I forget if I did already) that the delays seen weren't actually "in" read(2) -- truss -d shows the amount of time that passes between syscalls. The delays I was seeing were *between* read(2) calls, which acts as a further indicator that some code internal to grep (or libc) was spinning/churning much more heavily when a locale was used. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |