From owner-freebsd-current@FreeBSD.ORG Sun Aug 15 00:45:15 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 72F95106566B for ; Sun, 15 Aug 2010 00:45:15 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from mail2.fluidhosting.com (mx21.fluidhosting.com [204.14.89.4]) by mx1.freebsd.org (Postfix) with ESMTP id 24DEE8FC08 for ; Sun, 15 Aug 2010 00:45:15 +0000 (UTC) Received: (qmail 26101 invoked by uid 399); 15 Aug 2010 00:45:14 -0000 Received: from localhost (HELO lap.dougb.net) (dougb@dougbarton.us@127.0.0.1) by localhost with ESMTPAM; 15 Aug 2010 00:45:14 -0000 X-Originating-IP: 127.0.0.1 X-Sender: dougb@dougbarton.us Message-ID: <4C673898.2080609@FreeBSD.org> Date: Sat, 14 Aug 2010 17:45:12 -0700 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.2.8) Gecko/20100807 Thunderbird/3.1.2 MIME-Version: 1.0 To: Ivan Voras References: <4C6505A4.9060203@FreeBSD.org> <4C650B75.3020800@FreeBSD.org> <4C651192.9020403@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.1.2 OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: Official request: Please make GNU grep the default X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Aug 2010 00:45:15 -0000 Ivan, I know that you mean this at least semi-humorously, however I'm going to provide a dead-serious reply below. On 08/14/2010 16:04, Ivan Voras wrote: > On 13.8.2010 11:34, Doug Barton wrote: > >> To be fair, I didn't notice a performance difference either until I >> started revamping this code that calls my parse_index() for every single >> installed port. Given a 22,042 line INDEX file, that's enough to add up >> to something noticeable. > > Wouldn't this might, just might, be an indication that one of the > following is true: > > 1) writing complex performance-sensitive utilities in shell code simply > sucks because it's too sensitive to issues like borderline behaviours of > utilities As someone who used to make a pretty good living writing highly performance-oriented CGI applications in perl I would agree with you here to some extent. The original version of what could reasonably be called an antecedent to what is now portmaster was 102 lines, but only 49 were actual code (the rest were comments or whitespace). The current behemoth (my dev version that is) is 3,702 lines, 3,069 of which is actual code. So yes, there is an element of insanity here (and yes, the current code is under-commented, for those keeping score at home). > 2) implementing complex data structures that might save you reparsing on > the order of complexity of O(npkg * nlines) is too demanding in shell > code and this means it's not exactly the best tool for the job Again, partial agreement. One of the reasons I resisted INDEX support for so long was that my original idea of it was to do exactly what you suggest here, parse it once then look up the data internally. However even though I _can_ do this in shell it actually makes the performance worse since now I've got his huge memory footprint to pass around every time portmaster calls itself recursively (which for those who don't know is portmaster's entire model of operation). BUT, none of that is germane to my actual argument. I was very careful to NOT say, "BSD grep is slow, which screws up portmaster, so the default has to change." What I said was, "BSD grep is anywhere from 6 to 15 TIMES slower than GNU grep in all cases, so the default needs to change." If you insist on applying that directly to portmaster, I will say that implementing it in shell is a very conscious design tradeoff. If I hadn't already observed the hilarity ensuing around portupgrade/ruby updates, and I was sitting down today to design a "ports management tool" from scratch, I'd use perl, no question. Even without its own db everything that portmaster does could be done more easily and faster in perl. However, even granting THIS point the fact remains that the previous status quo was 1) a text file data store with a known, (mostly) easy to parse structure, and 2) an easy to use, fast tool to access that data with. Your line of reasoning boils down to, "You shouldn't object to the new tool being slower because you are doing things you shouldn't have been doing with the old tool in the first place." Even IF I were willing to grant you that point inre portmaster (I'm not, but let's just say ...) are you willing to tell every user of grep for every other purpose (including all the many places it's used in the base, like /etc, /etc/rc.d, the build ....), "You have to put up with a slow grep because ....?" Doug -- Improve the effectiveness of your Internet presence with a domain name makeover! http://SupersetSolutions.com/ Computers are useless. They can only give you answers. -- Pablo Picasso