Date: Mon, 29 Nov 2010 14:39:22 -0500 From: David Schultz <das@FreeBSD.ORG> To: Dimitry Andric <dimitry@andric.com> Cc: svn-src-head@FreeBSD.ORG, mdf@FreeBSD.ORG, svn-src-all@FreeBSD.ORG, src-committers@FreeBSD.ORG, Gabor Kovesdan <gabor@FreeBSD.ORG> Subject: Re: svn commit: r211463 - head/usr.bin/grep Message-ID: <20101129193922.GA70555@zim.MIT.EDU> In-Reply-To: <4C6C4FDD.8080803@andric.com> References: <201008181740.o7IHeA4c075984@svn.freebsd.org> <AANLkTimjHt9NZa0-vU%2Bm2dkY2pTciUDLGd0Qut=uhFTq@mail.gmail.com> <4C6C4FDD.8080803@andric.com>
index | next in thread | previous in thread | raw e-mail
On Wed, Aug 18, 2010, Dimitry Andric wrote: > On 2010-08-18 22:48, mdf@FreeBSD.org wrote: > >> - Refactor file reading code to use pure syscalls and an internal buffer > >> instead of stdio. This gives BSD grep a very big performance boost, > >> its speed is now almost comparable to GNU grep. > > > > I didn't read all of the details in the profiling mails in the thread, > > but does this mean that work on stdio would give a performance boost > > to many apps? Or is there something specific about how grep(1) is > > using its input that makes it a horse of a different color? > > Originally, it was reading files 1 character at a time, using fgetc(3), > the locking version even. This is usually not the fastest way to read > a large file with stdio. :) > > If grep did not have to support .gz or .bz2 files, we could just have > plugged in stdio's fgetln(3). I tried this approach first on some > non-compressed files, and it performed much better than fgetc'ing. > > The reading code that was now committed, is basically the same algorithm > as fgetln() uses internally, but it can handle gzip and bzip2 input too. The gzip limitations you refer to could perhaps be worked around with a simple application of funopen(3). IIRC, the overhead inherent in using fgetln(3) or getline(3) on reasonably long lines is very small; if it's not, we should look at ways to improve stdio. There's still a locking operation and memcpy() that can't really be avoided with stdio, though. With getline(), you'd be able to delete most of file.c, but it would never be quite as fast.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101129193922.GA70555>
