From owner-svn-src-all@FreeBSD.ORG Mon Nov 29 20:16:08 2010 Return-Path: Delivered-To: svn-src-all@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC8551065672; Mon, 29 Nov 2010 20:16:08 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 204338FC1A; Mon, 29 Nov 2010 20:16:07 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id oATJdNf0070934; Mon, 29 Nov 2010 14:39:23 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.4/8.14.2/Submit) id oATJdMBh070933; Mon, 29 Nov 2010 14:39:22 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Mon, 29 Nov 2010 14:39:22 -0500 From: David Schultz To: Dimitry Andric Message-ID: <20101129193922.GA70555@zim.MIT.EDU> Mail-Followup-To: Dimitry Andric , mdf@freebsd.org, Gabor Kovesdan , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201008181740.o7IHeA4c075984@svn.freebsd.org> <4C6C4FDD.8080803@andric.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C6C4FDD.8080803@andric.com> Cc: svn-src-head@FreeBSD.ORG, mdf@FreeBSD.ORG, svn-src-all@FreeBSD.ORG, src-committers@FreeBSD.ORG, Gabor Kovesdan Subject: Re: svn commit: r211463 - head/usr.bin/grep X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2010 20:16:08 -0000 On Wed, Aug 18, 2010, Dimitry Andric wrote: > On 2010-08-18 22:48, mdf@FreeBSD.org wrote: > >> - Refactor file reading code to use pure syscalls and an internal buffer > >> instead of stdio. This gives BSD grep a very big performance boost, > >> its speed is now almost comparable to GNU grep. > > > > I didn't read all of the details in the profiling mails in the thread, > > but does this mean that work on stdio would give a performance boost > > to many apps? Or is there something specific about how grep(1) is > > using its input that makes it a horse of a different color? > > Originally, it was reading files 1 character at a time, using fgetc(3), > the locking version even. This is usually not the fastest way to read > a large file with stdio. :) > > If grep did not have to support .gz or .bz2 files, we could just have > plugged in stdio's fgetln(3). I tried this approach first on some > non-compressed files, and it performed much better than fgetc'ing. > > The reading code that was now committed, is basically the same algorithm > as fgetln() uses internally, but it can handle gzip and bzip2 input too. The gzip limitations you refer to could perhaps be worked around with a simple application of funopen(3). IIRC, the overhead inherent in using fgetln(3) or getline(3) on reasonably long lines is very small; if it's not, we should look at ways to improve stdio. There's still a locking operation and memcpy() that can't really be avoided with stdio, though. With getline(), you'd be able to delete most of file.c, but it would never be quite as fast.