Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Aug 2010 17:45:12 -0700
From:      Doug Barton <dougb@FreeBSD.org>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Official request: Please make GNU grep the default
Message-ID:  <4C673898.2080609@FreeBSD.org>
In-Reply-To: <i477eo$i4d$1@dough.gmane.org>
References:  <4C6505A4.9060203@FreeBSD.org> <4C650B75.3020800@FreeBSD.org>	<4C651192.9020403@FreeBSD.org> <i477eo$i4d$1@dough.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Ivan,

I know that you mean this at least semi-humorously, however I'm going to
provide a dead-serious reply below.

On 08/14/2010 16:04, Ivan Voras wrote:
> On 13.8.2010 11:34, Doug Barton wrote:
>
>> To be fair, I didn't notice a performance difference either until I
>> started revamping this code that calls my parse_index() for every single
>> installed port. Given a 22,042 line INDEX file, that's enough to add up
>> to something noticeable.
> 
> Wouldn't this might, just might, be an indication that one of the
> following is true:
> 
> 1) writing complex performance-sensitive utilities in shell code simply
> sucks because it's too sensitive to issues like borderline behaviours of
> utilities

As someone who used to make a pretty good living writing highly
performance-oriented CGI applications in perl I would agree with you
here to some extent. The original version of what could reasonably be
called an antecedent to what is now portmaster was 102 lines, but only
49 were actual code (the rest were comments or whitespace). The current
behemoth (my dev version that is) is 3,702 lines, 3,069 of which is
actual code. So yes, there is an element of insanity here (and yes, the
current code is under-commented, for those keeping score at home).

> 2) implementing complex data structures that might save you reparsing on
> the order of complexity of O(npkg * nlines) is too demanding in shell
> code and this means it's not exactly the best tool for the job

Again, partial agreement. One of the reasons I resisted INDEX support
for so long was that my original idea of it was to do exactly what you
suggest here, parse it once then look up the data internally. However
even though I _can_ do this in shell it actually makes the performance
worse since now I've got his huge memory footprint to pass around every
time portmaster calls itself recursively (which for those who don't know
is portmaster's entire model of operation).

BUT, none of that is germane to my actual argument. I was very careful
to NOT say, "BSD grep is slow, which screws up portmaster, so the
default has to change." What I said was, "BSD grep is anywhere from 6 to
15 TIMES slower than GNU grep in all cases, so the default needs to
change."

If you insist on applying that directly to portmaster, I will say that
implementing it in shell is a very conscious design tradeoff. If I
hadn't already observed the hilarity ensuing around portupgrade/ruby
updates, and I was sitting down today to design a "ports management
tool" from scratch, I'd use perl, no question. Even without its own db
everything that portmaster does could be done more easily and faster in
perl. However, even granting THIS point the fact remains that the
previous status quo was 1) a text file data store with a known, (mostly)
easy to parse structure, and 2) an easy to use, fast tool to access that
data with.

Your line of reasoning boils down to, "You shouldn't object to the new
tool being slower because you are doing things you shouldn't have been
doing with the old tool in the first place." Even IF I were willing to
grant you that point inre portmaster (I'm not, but let's just say ...)
are you willing to tell every user of grep for every other purpose
(including all the many places it's used in the base, like /etc,
/etc/rc.d, the build ....), "You have to put up with a slow grep because
....?"


Doug

-- 

	Improve the effectiveness of your Internet presence with
	a domain name makeover!    http://SupersetSolutions.com/

	Computers are useless. They can only give you answers.
			-- Pablo Picasso




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C673898.2080609>