Date: Tue, 17 Aug 2010 17:28:08 +0200 From: Dimitry Andric <dimitry@andric.com> To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= <des@des.no> Cc: Doug Barton <dougb@FreeBSD.org>, Justin Hibbits <chmeeedalf@gmail.com>, core@freebsd.org, delphij@freebsd.org, Gabor Kovesdan <gabor@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>, current@freebsd.org Subject: Re: Official request: Please make GNU grep the default Message-ID: <4C6AAA88.5080606@andric.com> In-Reply-To: <86sk2faqdl.fsf@ds4.des.no> References: <4C6505A4.9060203@FreeBSD.org> <20100813085235.GA16268@freebsd.org> <4C66C010.3040308@FreeBSD.org> <4C673F02.8000805@FreeBSD.org> <20100815013438.GA8958@troutmask.apl.washington.edu> <4C67492C.5020206@FreeBSD.org> <B7A05068-9578-4341-851B-86BD9BC7A2DA@gmail.com> <8639ufd78w.fsf@ds4.des.no> <4C6844D8.5070602@andric.com> <86sk2faqdl.fsf@ds4.des.no>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2010-08-16 10:55, Dag-Erling Sm=C3=B8rgrav wrote: > Dimitry Andric <dimitry@andric.com> writes: >> - Uses plain file descriptors instead of struct FILE, since the >> buffering is done manually anyway, and it makes it easier to support= >> gzip and bzip2. > It might be worth a shot adding mmap(2) support as well, i.e. when > processing an uncompressed regular file, try to mmap(2) it first, and i= f > that fails, fall back to the plain buffered read(2) method. I added a simple mmap to grep, and time-trialed it, but the mmap version was somewhat slower than the regular version. I understood from Kostik Belousov that readahead does not work properly with mmap, and it should not be used for "one-time" reads. I also experimented with different buffer sizes on the same big test file, and this gives the following results (times in s): buffer size test1 test2 test3 average =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D =3D=3D=3D= =3D=3D=3D 512 467 484 465 472 1,024 391 415 392 399 2,048 361 356 365 361 4,096 353 353 356 354 8,192 348 345 357 350 16,384 341 373 350 354 32,768 339 348 346 344 65,536 336 359 371 355 262,144 334 352 350 345 1,048,576 334 350 351 345 2,097,152 339 342 369 350 373,293,056 544 547 559 550 E.g. the 32k buffer size that I borrowed from GNU grep seems to be reasonable enough. There is no profit in wasting huge amounts of memory to speed things up.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6AAA88.5080606>