Date: Tue, 03 Nov 2009 22:19:05 +0100 From: Gabor Kovesdan <gabor@FreeBSD.org> To: Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net> Cc: freebsd-hackers@freebsd.org Subject: Re: Issue with grep -i (on i386 only?) Message-ID: <4AF09E49.3010705@FreeBSD.org> In-Reply-To: <200911032122.28905.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net> References: <200911032122.28905.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Mel Flynn escribió: > Hi, > > attached a little test script for grep's -i performance. I tried a few > different machines and the 64-bit 7.2 machine I could steal doesn't seem to be > affected and out performs pcregrep. > Note, that pcregrep isn't POSIX regex so it's not a good base of comparison. PCRE provides a POSIX-compliant interface to deal with Perl-compatible regex for those, who are already familiar with the former but it's still Perl regex and not POSIX! That's why some people get confused when PCRE comes to the topic. > On i386 machines, grep -i is significantly slower: > i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00, > Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free > dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled) > 16Meg file result: > =>>> 16777216 > =>>> fgrep > 0.04 real 0.02 user 0.01 sys > 0.04 real 0.03 user 0.01 sys > =>>> pcregrep > 0.21 real 0.19 user 0.02 sys > 0.21 real 0.20 user 0.00 sys > =>>> grep > 0.04 real 0.02 user 0.01 sys << not -i > 3.64 real 3.61 user 0.01 sys << -i > It's an interesting observation, I have never heard of this. > So it looks to me that, while there is a problem with case insensitive > comparison, just rewriting the expression is an optimization grep could > perform. > Either way, with the new text tools being written (done?) is this problem > being attacked, not fixable due to specifications or not considered an issue? > Any PR's needed / I missed? Patches to try? > > [And it just occured to me bsdgrep is in ports]: > =>>> bsdgrep > 0.93 real 0.74 user 0.00 sys > 4.80 real 4.33 user 0.02 sys > 4.97 real 4.34 user 0.01 sys > > So here the optimization does not fly. Unfortunately, this is the most important issue with BSDL texttools. In the grep case, the BSDL version is ready and feature-complete but the performance isn't quite satisfying. The main reason of this is GNU grep uses a lot of shortcuts, which results in a bloated code (8000 LOC), while BSDL grep keeps everything simple and straightforward (1500 LOC). IMO, the desired solution would be to keep grep small and get a modern regex library for FreeBSD, which performs well. Pushing regex optimizations into grep is a bad idea because it not just makes the code bloated but other regex users won't benefit from the optimization so the problem should be fixed at its roots. And the current regex library we have is old, slow and doesn't support wchar, at all. Btw, do you mind if I include your script into the BSD grep distribution? I already planned to write something like this for future testing. -- Gabor Kovesdan FreeBSD Volunteer EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4AF09E49.3010705>