Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Aug 2010 18:12:34 -0700
From:      Doug Barton <dougb@FreeBSD.org>
To:        Gabor Kovesdan <gabor@FreeBSD.org>
Cc:        delphij@FreeBSD.org, core@FreeBSD.org, current@FreeBSD.org
Subject:   Re: Official request: Please make GNU grep the default
Message-ID:  <4C673F02.8000805@FreeBSD.org>
In-Reply-To: <4C66C010.3040308@FreeBSD.org>
References:  <4C6505A4.9060203@FreeBSD.org> <20100813085235.GA16268@freebsd.org> <4C66C010.3040308@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 08/14/2010 09:10, Gabor Kovesdan wrote:
> Em 2010.08.13. 10:52, Roman Divacky escreveu:
>> what about optimizing BSD grep instead?
>>    
> [... picking one mail from the many that suggest this ...]

... and responding to your message for the same reason ... :)

[Snipping the bit about why it's a hard problem not likely to be solved
in the next few weeks.]

> If you can make suggestions to make BSD grep faster without touching the
> regex library please do it and if we can get a performance that is
> acceptable, we can reconsider leaving it the default if nobody objects.
> I'll check Sean's suggestions and make some measures how much does that
> help.

As I posted to you privately, the results I got with JUST Sean's patch
on the test case I posted previously were:

GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 31 seconds

With the more complete patch you provided me privately I was able to
shave one more second off the BSD grep case. So that's a lot better than
the 47 seconds it was previously, but still a long way to go.

I also have a new test case script which actually IS something that
portmaster does, and in fact is the ugliest and most difficult search
that it has to perform, finding an installed port based on grep'ing
+CONTENTS files for an ORIGIN pattern:

http://people.freebsd.org/~dougb/grep-time-trial-2.sh.txt

Typical times for me, with 489 ports:

GNU grep
Elapsed time: 3 seconds

BSD grep
Elapsed time: 17 seconds

(And before anyone bothers to reply saying "Use pkg_info -O for that"
I'll save you the trouble. My version is from 10-20% faster. Not sure
why, don't really care.) :)

For those whose line of reasoning was, "But this is -current, so it's ok
for things to be screwed up" my response is, only to a point. In the
real world, people who don't care about performance and/or don't use
grep in interesting and imaginative ways aren't going to mind BSD grep
as the default, but also don't provide really useful test cases. "It
works fine up to the 80'th percentile" has already been demonstrated by
various pointyhat runs, etc.

Sophisticated users who DO care about performance and/or DO use grep in
interesting and creative ways will put up with the breakage for a while,
then switch their make.conf to use GNU grep, usually silently. Therefore
they stop providing ANY test data at all, never mind useful.

However, given the very small number of people who actually test
-current in the first place, the population I am really concerned about
is the group of people who casually try -current, see that "It's really
slow sometimes," don't/can't figure out why, and then get discouraged
and just stop using -current at all. Now you might reply, "Great! Good
riddance to those dilettantes!" However I believe rather strongly that
we want to make the -current environment MORE friendly to users, even
casual users. Who do you think is actually going to test "What will
become 9.0-RELEASE" if we don't?

OTOH, leaving it in, but switching the default gives those who are
highly motivated to test and/or improve it a very easy way to do so,
without causing problems for anyone else. It also makes it that much
easier to make it the default again when it IS ready for prime time.

Meanwhile, in response to everyone else, a simple question. How many
TIMES (not percentages, multiples) slower is it Ok for BSD grep to be in
comparison to GNU grep and stay the default?


Doug

-- 

	Improve the effectiveness of your Internet presence with
	a domain name makeover!    http://SupersetSolutions.com/

	Computers are useless. They can only give you answers.
			-- Pablo Picasso




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C673F02.8000805>