Date: Tue, 3 Nov 2009 23:14:45 +0100 From: Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net> To: freebsd-hackers@freebsd.org Cc: Gabor Kovesdan <gabor@freebsd.org> Subject: Re: Issue with grep -i (on i386 only?) Message-ID: <200911032314.45247.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net> In-Reply-To: <4AF09E49.3010705@FreeBSD.org> References: <200911032122.28905.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net> <4AF09E49.3010705@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 03 November 2009 22:19:05 Gabor Kovesdan wrote: > Mel Flynn escribi=F3: > > Hi, > > > > attached a little test script for grep's -i performance. I tried a few > > different machines and the 64-bit 7.2 machine I could steal doesn't seem > > to be affected and out performs pcregrep. >=20 > Note, that pcregrep isn't POSIX regex so it's not a good base of > comparison. PCRE provides a POSIX-compliant interface to deal with > Perl-compatible regex for those, who are already familiar with the > former but it's still Perl regex and not POSIX! That's why some people > get confused when PCRE comes to the topic. I realize this, but for the case in question it does not matter. Both=20 'regexes' should do the same in PCRE and POSIX. I provided the comparison t= o=20 show that the 'problem of case insensitive comparison' is solvable, at the= =20 very least for the simple case. > > On i386 machines, grep -i is significantly slower: > > i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00, > > Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free > > dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled) > > 16Meg file result: > > =3D>>> 16777216 > > =3D>>> fgrep > > 0.04 real 0.02 user 0.01 sys > > 0.04 real 0.03 user 0.01 sys > > =3D>>> pcregrep > > 0.21 real 0.19 user 0.02 sys > > 0.21 real 0.20 user 0.00 sys > > =3D>>> grep > > 0.04 real 0.02 user 0.01 sys << not -i > > 3.64 real 3.61 user 0.01 sys << -i >=20 > It's an interesting observation, I have never heard of this. >=20 > > So it looks to me that, while there is a problem with case insensitive > > comparison, just rewriting the expression is an optimization grep could > > perform. > > Either way, with the new text tools being written (done?) is this probl= em > > being attacked, not fixable due to specifications or not considered an > > issue? Any PR's needed / I missed? Patches to try? > > > > [And it just occured to me bsdgrep is in ports]: > > =3D>>> bsdgrep > > 0.93 real 0.74 user 0.00 sys > > 4.80 real 4.33 user 0.02 sys > > 4.97 real 4.34 user 0.01 sys > > > > So here the optimization does not fly. >=20 > Unfortunately, this is the most important issue with BSDL texttools. In > the grep case, the BSDL version is ready and feature-complete but the > performance isn't quite satisfying. The main reason of this is GNU grep > uses a lot of shortcuts, which results in a bloated code (8000 LOC), > while BSDL grep keeps everything simple and straightforward (1500 LOC). > IMO, the desired solution would be to keep grep small and get a modern > regex library for FreeBSD, which performs well. Pushing regex > optimizations into grep is a bad idea because it not just makes the code > bloated but other regex users won't benefit from the optimization so the > problem should be fixed at its roots. And the current regex library we > have is old, slow and doesn't support wchar, at all. With this kind of difference, I don't really care who performs the=20 optimization, but it seems that multiple options at the same character spot= is=20 not handled very well, with an extra penalty for "case insensitive". Why this isn't present on my 64-bit machine is a bit of a mystery to me, bu= t=20 since almost no time is spent in sys, I can't blame it on kernel. > Btw, do you mind if I include your script into the BSD grep > distribution? I already planned to write something like this for future > testing. Consider it public domain. =2D-=20 Mel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200911032314.45247.mel.flynn%2Bfbsd.hackers>