From owner-freebsd-current@FreeBSD.ORG Tue Aug 17 15:28:00 2010 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 822D61065672; Tue, 17 Aug 2010 15:28:00 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net [IPv6:2001:7b8:2ff:146::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3D4658FC18; Tue, 17 Aug 2010 15:28:00 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7:0:2911:19d3:9b0d:9343] (unknown [IPv6:2001:7b8:3a7:0:2911:19d3:9b0d:9343]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 6B4A35C59; Tue, 17 Aug 2010 17:27:59 +0200 (CEST) Message-ID: <4C6AAA88.5080606@andric.com> Date: Tue, 17 Aug 2010 17:28:08 +0200 From: Dimitry Andric User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.9pre) Gecko/20100814 Lanikai/3.1.3pre MIME-Version: 1.0 To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= References: <4C6505A4.9060203@FreeBSD.org> <20100813085235.GA16268@freebsd.org> <4C66C010.3040308@FreeBSD.org> <4C673F02.8000805@FreeBSD.org> <20100815013438.GA8958@troutmask.apl.washington.edu> <4C67492C.5020206@FreeBSD.org> <8639ufd78w.fsf@ds4.des.no> <4C6844D8.5070602@andric.com> <86sk2faqdl.fsf@ds4.des.no> In-Reply-To: <86sk2faqdl.fsf@ds4.des.no> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Doug Barton , Justin Hibbits , core@freebsd.org, delphij@freebsd.org, Gabor Kovesdan , Steve Kargl , current@freebsd.org Subject: Re: Official request: Please make GNU grep the default X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Aug 2010 15:28:00 -0000 On 2010-08-16 10:55, Dag-Erling Sm=C3=B8rgrav wrote: > Dimitry Andric writes: >> - Uses plain file descriptors instead of struct FILE, since the >> buffering is done manually anyway, and it makes it easier to support= >> gzip and bzip2. > It might be worth a shot adding mmap(2) support as well, i.e. when > processing an uncompressed regular file, try to mmap(2) it first, and i= f > that fails, fall back to the plain buffered read(2) method. I added a simple mmap to grep, and time-trialed it, but the mmap version was somewhat slower than the regular version. I understood from Kostik Belousov that readahead does not work properly with mmap, and it should not be used for "one-time" reads. I also experimented with different buffer sizes on the same big test file, and this gives the following results (times in s): buffer size test1 test2 test3 average =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D =3D=3D=3D= =3D=3D=3D 512 467 484 465 472 1,024 391 415 392 399 2,048 361 356 365 361 4,096 353 353 356 354 8,192 348 345 357 350 16,384 341 373 350 354 32,768 339 348 346 344 65,536 336 359 371 355 262,144 334 352 350 345 1,048,576 334 350 351 345 2,097,152 339 342 369 350 373,293,056 544 547 559 550 E.g. the 32k buffer size that I borrowed from GNU grep seems to be reasonable enough. There is no profit in wasting huge amounts of memory to speed things up.