From owner-freebsd-current@FreeBSD.ORG Fri Aug 13 13:22:52 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A175A10656A3 for ; Fri, 13 Aug 2010 13:22:52 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id 57DBA8FC08 for ; Fri, 13 Aug 2010 13:22:52 +0000 (UTC) Received: from server.mypc.hu (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id E778414DC62E; Fri, 13 Aug 2010 15:22:49 +0200 (CEST) X-Virus-Scanned: amavisd-new at server.mypc.hu Received: from server.mypc.hu ([127.0.0.1]) by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id vo4MQYGpzFb4; Fri, 13 Aug 2010 15:22:46 +0200 (CEST) Received: from [192.168.1.105] (catv-80-99-92-167.catv.broadband.hu [80.99.92.167]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id 95CFF14DC5FB; Fri, 13 Aug 2010 15:22:46 +0200 (CEST) Message-ID: <4C654723.1070503@FreeBSD.org> Date: Fri, 13 Aug 2010 15:22:43 +0200 From: Gabor Kovesdan User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-PT; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5 MIME-Version: 1.0 To: Matthias Andree References: <4C6505A4.9060203@FreeBSD.org> <4C650B75.3020800@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: Official request: Please make GNU grep the default X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Aug 2010 13:22:52 -0000 Em 2010.08.13. 13:09, Matthias Andree escreveu: > Gabor Kovesdan wrote on 2010-08-13: > >> Em 2010.08.13. 10:43, Doug Barton escreveu: >>> My reason is simple, performance. While doing some portmaster work >>> recently I was regression testing some changes I made to the --index* >>> options and noticed that things were dramatically slower than the last >>> time I tested those features. Thinking that I had made a programming >>> mistake I dug into my code, and while the regexps that I was using >>> could >>> be tuned for slightly better performance the problem was not in my >>> code. >>> I then installed textproc/gnugrep to compare, and the differences were >>> very dramatic using a highly pessimized test case (finding a match on >>> the last line of INDEX). The script I used to test is at >>> http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical >>> result was: >>> >>> GNU grep >>> Elapsed time: 2 seconds >>> >>> BSD grep >>> Elapsed time: 47 seconds >>> >> Ok, I'll take care of this soon, and make GNU grep default, again >> with a knob to build BSD grep. I agree with you that we cannot allow >> such a big performance drawback but I my measures only showed >> significant differences for very big searches and I didn't imagine >> that it could add up to such a big diference. I'm sorry for the bad >> decision I took making it default. > > Without knowing any of the details (I am not using 9-CURRENT), Gabor, > I suggest that you check the documentation around Google's RE2 library > (which is in C++); there are quite a few bits of information relating > to (including worst-case) performance of regexp matchers, both > directly in the re2 documentation, as well as indirect through links > and references. Might be worth a read, together with profiling Doug's > test case if he could tell you how to reproduce those. > Thanks, Matthias. I haven't looked deeply at this but iirc it uses Perl-syntax. We need an efficient, wchar-aware, POSIX(ish) regex library with a good license and atm only TRE conforms to these criteria. Besides, we need GNU-style regex support, which will have to be added to TRE before we can replace our libc-regex. Gabor