Date: Sat, 25 Mar 2006 21:39:27 +1100 From: Peter Jeremy <peterjeremy@optushome.com.au> To: Mikhail Teterin <mi+mx@aldan.algebra.com> Cc: alc@freebsd.org, stable@freebsd.org Subject: Re: Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS) Message-ID: <20060325103927.GE703@turion.vk2pj.dyndns.org> In-Reply-To: <200603241518.01027.mi%2Bmx@aldan.algebra.com> References: <200603232352.k2NNqPS8018729@gate.bitblocks.com> <200603241518.01027.mi%2Bmx@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2006-Mar-24 15:18:00 -0500, Mikhail Teterin wrote: >which there is not with the read. Read also requires fairly large buffers in >the user space to be efficient -- *in addition* to the buffers in the kernel. I disagree. With a filesystem read, the kernel is solely responsible for handling physical I/O with an efficient buffer size. The userland buffers simply amortise the cost of the system call and copyout overheads. >I'm also quite certain, that fulfulling my "demands" would add quite a bit of >complexity to the mmap support in kernel, but hey, that's what the kernel is >there for :-) Unfortunately, your patches to implement this seem to have become detached from your e-mail. :-) >Unlike grep, which seems to use only 32k buffers anyway (and does not use >madvise -- see attachment), my program mmaps gigabytes of the input file at >once, trusting the kernel to do a better job at reading the data in the most >efficient manner :-) mmap can lend itself to cleaner implementatione because there's no need to have a nested loop to read buffers and then process them. You can mmap then entire file and process it. The downside is that on a 32-bit architecture, this limits you to processing files that are somewhat less than 2GB. The downside is that touching an uncached page triggers a trap which may not be as efficient as reading a block of data through the filesystem interface, and I/O errors are delivered via signals (which may not be as easy to handle). >Peter Jeremy wrote: >> On an amd64 system running about 6-week old -stable, both ['grep' and 'grep >> --mmap' -mi] behave pretty much identically. > >Peter, I read grep's source -- it is not using madvise (because it hurts >performance on SunOS-4.1!) and reads in chunks of 32k anyway. Would you care >to look at my program instead? Thanks: > > http://aldan.algebra.com/mzip.c fetch: http://aldan.algebra.com/mzip.c: Not Found I tried writing a program that just mmap'd my entire (2GB) test file and summed all the longwords in it. This gave me similar results to grep. Setting MADV_SEQUENTIAL and/or MADV_WILLNEED made no noticable difference. I suspect something about your code or system is disabling the mmap read-ahead functionality. What happens if you simulate read-ahead yourself? Have your main program fork and the child access pages slightly ahead of the parent but do nothing else. -- Peter Jeremy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060325103927.GE703>