From owner-freebsd-current@FreeBSD.ORG Wed Aug 18 21:54:42 2010 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0171F1065695 for ; Wed, 18 Aug 2010 21:54:42 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net [IPv6:2001:7b8:2ff:146::2]) by mx1.freebsd.org (Postfix) with ESMTP id 880CF8FC0A for ; Wed, 18 Aug 2010 21:54:41 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7:0:f4aa:bf1e:4470:d911] (unknown [IPv6:2001:7b8:3a7:0:f4aa:bf1e:4470:d911]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id C7CC55C59; Wed, 18 Aug 2010 23:54:40 +0200 (CEST) Message-ID: <4C6C56A1.9050405@andric.com> Date: Wed, 18 Aug 2010 23:54:41 +0200 From: Dimitry Andric User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.9pre) Gecko/20100814 Lanikai/3.1.3pre MIME-Version: 1.0 To: Peter Jeremy References: <4C673F02.8000805@FreeBSD.org> <20100815013438.GA8958@troutmask.apl.washington.edu> <4C67492C.5020206@FreeBSD.org> <8639ufd78w.fsf@ds4.des.no> <4C6844D8.5070602@andric.com> <86sk2faqdl.fsf@ds4.des.no> <4C6AAA88.5080606@andric.com> <4C6AF13A.1080606@andric.com> <20100818205206.GA55620@server.vk2pj.dyndns.org> <4C6C4CBF.1040200@andric.com> In-Reply-To: <4C6C4CBF.1040200@andric.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alan Cox , current@freebsd.org Subject: Re: Official request: Please make GNU grep the default X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 21:54:42 -0000 On 2010-08-18 23:12, Dimitry Andric wrote: >> And one trial is not statistically valid - especially given the small >> differences. How about multiple multiple trials with ministat. > > The result were averages of three trials Actually, since I kept using Doug's original grep-time-trial.sh, each of the three 'trials' consisted of running grep 100 times, and the listed time was the total elapsed time for those 100 runs. So I assume that will reasonably average out the differences between each individual run? Also, I'm not sure if the actual disk/fs reading performance will differ much between GNU grep and any other grep, since they will all basically read through the whole test file sequentially. For instance, when I profiled GNU grep with gprof, the top time results were: % cumulative self self total time seconds seconds calls ms/call ms/call name 99.1 0.59 0.59 11497 0.05 0.05 read [5] 0.6 0.59 0.00 11497 0.00 0.00 kwsexec [8] 0.1 0.59 0.00 0 100.00% .mcount (130) 0.1 0.59 0.00 1 0.62 594.77 grepfile [3] 0.1 0.60 0.00 11496 0.00 0.00 memmove [9] 0.0 0.60 0.00 4 0.03 0.03 memchr [10] 0.0 0.60 0.00 12541 0.00 0.00 memset [16] 0.0 0.60 0.00 11497 0.00 0.00 EGexecute [7] 0.0 0.60 0.00 11497 0.00 0.05 fillbuf [4] 0.0 0.60 0.00 11497 0.00 0.00 grepbuf [6] E.g. it looks like most of the time is spent in the read system call. If mmap'ing can help improve that, it would be nice, but I suspect the gains would be marginal. The actual performance difference is much more likely to be related to how efficiently grep parses out lines, and searches for regexps in there. BSD grep still has quite some room for improvement in that department.