Date: Wed, 18 Aug 2010 23:54:41 +0200 From: Dimitry Andric <dimitry@andric.com> To: Peter Jeremy <peterjeremy@acm.org> Cc: Alan Cox <alan.l.cox@gmail.com>, current@freebsd.org Subject: Re: Official request: Please make GNU grep the default Message-ID: <4C6C56A1.9050405@andric.com> In-Reply-To: <4C6C4CBF.1040200@andric.com> References: <4C673F02.8000805@FreeBSD.org> <20100815013438.GA8958@troutmask.apl.washington.edu> <4C67492C.5020206@FreeBSD.org> <B7A05068-9578-4341-851B-86BD9BC7A2DA@gmail.com> <8639ufd78w.fsf@ds4.des.no> <4C6844D8.5070602@andric.com> <86sk2faqdl.fsf@ds4.des.no> <4C6AAA88.5080606@andric.com> <AANLkTik-ee6iKiOoA=KMmmToS2giUOmW5JB-d1vBx9r3@mail.gmail.com> <4C6AF13A.1080606@andric.com> <20100818205206.GA55620@server.vk2pj.dyndns.org> <4C6C4CBF.1040200@andric.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2010-08-18 23:12, Dimitry Andric wrote: >> And one trial is not statistically valid - especially given the small >> differences. How about multiple multiple trials with ministat. > > The result were averages of three trials Actually, since I kept using Doug's original grep-time-trial.sh, each of the three 'trials' consisted of running grep 100 times, and the listed time was the total elapsed time for those 100 runs. So I assume that will reasonably average out the differences between each individual run? Also, I'm not sure if the actual disk/fs reading performance will differ much between GNU grep and any other grep, since they will all basically read through the whole test file sequentially. For instance, when I profiled GNU grep with gprof, the top time results were: % cumulative self self total time seconds seconds calls ms/call ms/call name 99.1 0.59 0.59 11497 0.05 0.05 read [5] 0.6 0.59 0.00 11497 0.00 0.00 kwsexec [8] 0.1 0.59 0.00 0 100.00% .mcount (130) 0.1 0.59 0.00 1 0.62 594.77 grepfile [3] 0.1 0.60 0.00 11496 0.00 0.00 memmove [9] 0.0 0.60 0.00 4 0.03 0.03 memchr [10] 0.0 0.60 0.00 12541 0.00 0.00 memset [16] 0.0 0.60 0.00 11497 0.00 0.00 EGexecute [7] 0.0 0.60 0.00 11497 0.00 0.05 fillbuf [4] 0.0 0.60 0.00 11497 0.00 0.00 grepbuf [6] E.g. it looks like most of the time is spent in the read system call. If mmap'ing can help improve that, it would be nice, but I suspect the gains would be marginal. The actual performance difference is much more likely to be related to how efficiently grep parses out lines, and searches for regexps in there. BSD grep still has quite some room for improvement in that department.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6C56A1.9050405>