From owner-freebsd-current@FreeBSD.ORG Sun Aug 22 11:54:37 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F51810656AD; Sun, 22 Aug 2010 11:54:37 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id C2C848FC16; Sun, 22 Aug 2010 11:54:36 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id B17F61FFC37; Sun, 22 Aug 2010 11:54:35 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id 8A3E4844B1; Sun, 22 Aug 2010 13:54:35 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Mike Haertel References: <201008210231.o7L2VRvI031700@ducky.net> Date: Sun, 22 Aug 2010 13:54:35 +0200 In-Reply-To: <201008210231.o7L2VRvI031700@ducky.net> (Mike Haertel's message of "Fri, 20 Aug 2010 19:31:27 -0700") Message-ID: <86k4nikglg.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org, gabor@freebsd.org Subject: Re: why GNU grep is fast X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Aug 2010 11:54:40 -0000 Mike Haertel writes: > GNU grep uses the well-known Boyer-Moore algorithm, which looks > first for the final letter of the target string, and uses a lookup > table to tell it how far ahead it can skip in the input whenever > it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. > GNU grep uses raw Unix input system calls and avoids copying data > after reading it. Yes, that was the first thing we looked at (and fixed) in BSD grep. > Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking > for newlines would slow grep down by a factor of several times, > because to find the newlines it would have to look at every byte! Amen. The current bottleneck in BSD grep is the memchr() that looks for '\n' in the input buffer. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no