From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 3 21:19:14 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 446461065670 for ; Tue, 3 Nov 2009 21:19:14 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id C08288FC15 for ; Tue, 3 Nov 2009 21:19:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id 9940214D9ADA; Tue, 3 Nov 2009 22:19:10 +0100 (CET) X-Virus-Scanned: amavisd-new at example.com Received: from server.mypc.hu ([127.0.0.1]) by localhost (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id VQ18u9CgEE4r; Tue, 3 Nov 2009 22:19:07 +0100 (CET) Received: from [192.168.1.105] (catv-89-132-179-104.catv.broadband.hu [89.132.179.104]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id D34A214D9ACB; Tue, 3 Nov 2009 22:19:07 +0100 (CET) Message-ID: <4AF09E49.3010705@FreeBSD.org> Date: Tue, 03 Nov 2009 22:19:05 +0100 From: Gabor Kovesdan User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: Mel Flynn References: <200911032122.28905.mel.flynn+fbsd.hackers@mailing.thruhere.net> In-Reply-To: <200911032122.28905.mel.flynn+fbsd.hackers@mailing.thruhere.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@freebsd.org Subject: Re: Issue with grep -i (on i386 only?) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2009 21:19:14 -0000 Mel Flynn escribió: > Hi, > > attached a little test script for grep's -i performance. I tried a few > different machines and the 64-bit 7.2 machine I could steal doesn't seem to be > affected and out performs pcregrep. > Note, that pcregrep isn't POSIX regex so it's not a good base of comparison. PCRE provides a POSIX-compliant interface to deal with Perl-compatible regex for those, who are already familiar with the former but it's still Perl regex and not POSIX! That's why some people get confused when PCRE comes to the topic. > On i386 machines, grep -i is significantly slower: > i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00, > Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free > dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled) > 16Meg file result: > =>>> 16777216 > =>>> fgrep > 0.04 real 0.02 user 0.01 sys > 0.04 real 0.03 user 0.01 sys > =>>> pcregrep > 0.21 real 0.19 user 0.02 sys > 0.21 real 0.20 user 0.00 sys > =>>> grep > 0.04 real 0.02 user 0.01 sys << not -i > 3.64 real 3.61 user 0.01 sys << -i > It's an interesting observation, I have never heard of this. > So it looks to me that, while there is a problem with case insensitive > comparison, just rewriting the expression is an optimization grep could > perform. > Either way, with the new text tools being written (done?) is this problem > being attacked, not fixable due to specifications or not considered an issue? > Any PR's needed / I missed? Patches to try? > > [And it just occured to me bsdgrep is in ports]: > =>>> bsdgrep > 0.93 real 0.74 user 0.00 sys > 4.80 real 4.33 user 0.02 sys > 4.97 real 4.34 user 0.01 sys > > So here the optimization does not fly. Unfortunately, this is the most important issue with BSDL texttools. In the grep case, the BSDL version is ready and feature-complete but the performance isn't quite satisfying. The main reason of this is GNU grep uses a lot of shortcuts, which results in a bloated code (8000 LOC), while BSDL grep keeps everything simple and straightforward (1500 LOC). IMO, the desired solution would be to keep grep small and get a modern regex library for FreeBSD, which performs well. Pushing regex optimizations into grep is a bad idea because it not just makes the code bloated but other regex users won't benefit from the optimization so the problem should be fixed at its roots. And the current regex library we have is old, slow and doesn't support wchar, at all. Btw, do you mind if I include your script into the BSD grep distribution? I already planned to write something like this for future testing. -- Gabor Kovesdan FreeBSD Volunteer EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org