From owner-freebsd-current@FreeBSD.ORG Sun Aug 22 20:44:15 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2DE0B10656A7; Sun, 22 Aug 2010 20:44:15 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id E4C2A8FC14; Sun, 22 Aug 2010 20:44:14 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id E59BA1FFC33; Sun, 22 Aug 2010 20:44:13 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id A9FAC845D4; Sun, 22 Aug 2010 22:44:13 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Mike Haertel References: <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no> <201008221822.o7MIMCRN050408@ducky.net> Date: Sun, 22 Aug 2010 22:44:13 +0200 In-Reply-To: <201008221822.o7MIMCRN050408@ducky.net> (Mike Haertel's message of "Sun, 22 Aug 2010 11:22:12 -0700") Message-ID: <86lj7yl6n6.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org, gabor@freebsd.org Subject: Re: why GNU grep is fast X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Aug 2010 20:44:15 -0000 Mike Haertel writes: > For example if your regex is /foo.*bar/, the initial Boyer-Moore search > is (probably) searching for foo. > > If the initial search succeeds, GNU grep isolates the containing line, > and then runs the full regex matcher on that line to make sure. You don't really need to "isolate the containing line" unless you have an actual match, do you? There are two cases: 1) The regexp does not use any character classes, including /./, so the FSA will stop if it hits EOL before it reaches an accepting state. 2) The regexp uses character classes, and you rewrite them to exclude \n: /[^bar]/ becomes /[^bar\n]/, /./ becomes /[^\n]/, etc., and the FSA will stop if it hits EOL before it reaches an accepting state. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no