Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Nov 2009 23:14:45 +0100
From:      Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net>
To:        freebsd-hackers@freebsd.org
Cc:        Gabor Kovesdan <gabor@freebsd.org>
Subject:   Re: Issue with grep -i (on i386 only?)
Message-ID:  <200911032314.45247.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net>
In-Reply-To: <4AF09E49.3010705@FreeBSD.org>
References:  <200911032122.28905.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net> <4AF09E49.3010705@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 03 November 2009 22:19:05 Gabor Kovesdan wrote:
> Mel Flynn escribi=F3:
> > Hi,
> >
> > attached a little test script for grep's -i performance. I tried a few
> > different machines and the 64-bit 7.2 machine I could steal doesn't seem
> > to be affected and out performs pcregrep.
>=20
> Note, that pcregrep isn't POSIX regex so it's not a good base of
> comparison. PCRE provides a POSIX-compliant interface to deal with
> Perl-compatible regex for those, who are already familiar with the
> former but it's still Perl regex and not POSIX! That's why some people
> get confused when PCRE comes to the topic.

I realize this, but for the case in question it does not matter. Both=20
'regexes' should do the same in PCRE and POSIX. I provided the comparison t=
o=20
show that the 'problem of case insensitive comparison' is solvable, at the=
=20
very least for the simple case.

> > On i386 machines, grep -i is significantly slower:
> > i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00,
> > Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free
> > dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled)
> > 16Meg file result:
> > =3D>>> 16777216
> >     =3D>>> fgrep
> >         0.04 real         0.02 user         0.01 sys
> >         0.04 real         0.03 user         0.01 sys
> >     =3D>>> pcregrep
> >         0.21 real         0.19 user         0.02 sys
> >         0.21 real         0.20 user         0.00 sys
> >     =3D>>> grep
> >         0.04 real         0.02 user         0.01 sys << not -i
> >         3.64 real         3.61 user         0.01 sys << -i
>=20
> It's an interesting observation, I have never heard of this.
>=20
> > So it looks to me that, while there is a problem with case insensitive
> > comparison, just rewriting the expression is an optimization grep could
> > perform.
> > Either way, with the new text tools being written (done?) is this probl=
em
> > being attacked, not fixable due to specifications or not considered an
> > issue? Any PR's needed / I missed? Patches to try?
> >
> > [And it just occured to me bsdgrep is in ports]:
> >     =3D>>> bsdgrep
> >         0.93 real         0.74 user         0.00 sys
> >         4.80 real         4.33 user         0.02 sys
> >         4.97 real         4.34 user         0.01 sys
> >
> > So here the optimization does not fly.
>=20
> Unfortunately, this is the most important issue with BSDL texttools. In
> the grep case, the BSDL version is ready and feature-complete but the
> performance isn't quite satisfying. The main reason of this is GNU grep
> uses a lot of shortcuts, which results in a bloated code (8000 LOC),
> while BSDL grep keeps everything simple and straightforward (1500 LOC).
> IMO, the desired solution would be to keep grep small and get a modern
> regex library for FreeBSD, which performs well. Pushing regex
> optimizations into grep is a bad idea because it not just makes the code
> bloated but other regex users won't benefit from the optimization so the
> problem should be fixed at its roots. And the current regex library we
> have is old, slow and doesn't support wchar, at all.

With this kind of difference, I don't really care who performs the=20
optimization, but it seems that multiple options at the same character spot=
 is=20
not handled very well, with an extra penalty for "case insensitive".
Why this isn't present on my 64-bit machine is a bit of a mystery to me, bu=
t=20
since almost no time is spent in sys, I can't blame it on kernel.

> Btw, do you mind if I include your script into the BSD grep
> distribution? I already planned to write something like this for future
> testing.

Consider it public domain.
=2D-=20
Mel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200911032314.45247.mel.flynn%2Bfbsd.hackers>