From owner-freebsd-arch Tue Apr 10 19:35:15 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 4363937B422 for ; Tue, 10 Apr 2001 19:35:00 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3B2Ysj97756; Tue, 10 Apr 2001 19:34:54 -0700 (PDT) (envelope-from dillon) Date: Tue, 10 Apr 2001 19:34:54 -0700 (PDT) From: Matt Dillon Message-Id: <200104110234.f3B2Ysj97756@earth.backplane.com> To: Peter Jeremy Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mmap(2) vs read(2)/write(2) References: <20010411095233.P66243@gsmx07.alcatel.com.au> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :It is my understanding that it is more efficient to access a file :via mmap rather than read/write, because the former needs one less :memory-memory copy. Yes and No. If the file is already in the cache, then mmap() is much faster because the program doesn't have to take any VM faults to access the data. But if the file is not in the cache the program winds up taking VM faults to map the pages in, and this is expensive enough that it goes a long ways towards making up for the copy overhead you would get with read(). This can be demonstrated with a test program. Create two test files, test1 and test2 using dd. One should be much larger then main memory, the other should be about 1/4 main memory. Run the program a couple of times before recording the results so prior runs or cache state does not interfere with the test results. Here is an example: # this assumes you have around 128M of ram dd if=/dev/zero of=test1 bs=1m count=1024 # create 1G file dd if=/dev/zero of=test1 bs=1m count=32 # create 32M file % ./rf -f test1 % ./rf -f test1 % ./rf -f test1 cksum 0 read 1073741824 bytes in 41.311 seconds, 24.788 MB/sec cpu 15.167 sec % ./rf -m test1 % ./rf -m test1 % ./rf -m test1 cksum 0 read 1073741824 bytes in 48.371 seconds, 21.170 MB/sec cpu 12.130 sec % ./rf -f test2 % ./rf -f test2 % ./rf -f test2 cksum 0 read 33554432 bytes in 0.367 seconds, 87.295 MB/sec cpu 0.368 sec % ./rf -m test2 % ./rf -m test2 % ./rf -m test2 cksum 0 read 33554432 bytes in 0.271 seconds, 117.958 MB/sec cpu 0.273 sec For the big file mmap() has lower performance (21.1MB/sec verses 24.7MB/sec), but actually eats fewer cpu cycles. In this case it is obvious that read() has a higher copy overhead, but the overhead is not interfering with the transfer rate. mmap()'s VM fault overhead, on the otherhand, is interfering with the transfer rate. It might be possible for me to fix this -- it has to do with the way VM fault does lookahead reads (it doesn't start the next lookahead read until it gets half way through the previous lookahead read). But the jist is that if the data is not in the cache, read() could very well be faster then mmap(). For the small file, mmap() wins hands down. (118MB/sec vs 87MB/sec), and takes less cpu as well (0.273 verses 0.368). If you comment out the madvise() for the small-file tests, performance goes down to around 114MB/sec in my test - the cost of taking 2788 VM faults in the cache case. Still better then read(). So what is the final answer? mmap() will be significantly faster for small cached files but the benefits are minimal or even possibly detrimental when used on large uncached files. You also have to consider the effect on the process's VM space. If a program is depending on there being 3G of mmapable space in its address space and you start mmap()ing files for stdio functions, and the program happens to also use a lot of stdio (fopen() and such), you could very well be polluting the mmapable space so much that the program fails. If we were to implement mmap() for stdio, it would have to be done very, very carefully to avoid unwanted side effects. I remember NeXT using mmap() for stdio, and I also remember hitting up against all sorts of weird side effects that caused me to want to tear my hair out. Ultimately I think the best solution is to add a setvbuf() mode #define to set a 'use mmap' mode, e.g. something like _IOMBF, and not have it do it by default. Then programs using the feature could be made portable with a simple #ifdef _IOMBF around the setvbuf call. -Matt /* * readfile [-f/-m] filename * * cc -O2 readfile.c -o rf */ #include #include #include #include #include #include #include #include #include int main(int ac, char **av) { int i; int memMode = -1; int fd; int cksum = 0; const char *path = NULL; struct timeval tv1; struct timeval tv2; struct stat st; struct rusage ru; for (i = 1; i < ac; ++i) { char *ptr = av[i]; if (*ptr != '-') { path = av[i]; continue; } switch(ptr[1]) { case 'f': memMode = 0; break; case 'm': memMode = 1; break; default: fprintf(stderr, "Bad option: %s\n", ptr); exit(1); } } if (memMode < 0) { fprintf(stderr, "Specify mode -f or -m\n"); exit(1); } if (path == NULL) { fprintf(stderr, "Specify file to read\n"); exit(1); } if (stat(path, &st) < 0 || !S_ISREG(st.st_mode)) { fprintf(stderr, "bad filespec: %s\n", path); exit(1); } if ((fd = open(path, O_RDONLY)) < 0) { perror("open"); exit(1); } gettimeofday(&tv1, NULL); if (memMode) { int *base = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0); int n = st.st_size / sizeof(int); int i; if (base == MAP_FAILED) { fprintf(stderr, "unable to mmap file\n"); exit(1); } madvise(base, st.st_size, MADV_WILLNEED); for (i = 0; i < n; ++i) cksum += base[i]; } else { char buf[32768]; int n; while ((n = read(fd, buf, sizeof(buf))) > 0) { n = n / sizeof(int); for (i = 0; i < n; ++i) cksum += buf[i]; } } gettimeofday(&tv2, NULL); getrusage(RUSAGE_SELF, &ru); { double usec = (tv2.tv_usec + 1000000 - tv1.tv_usec) + (tv2.tv_sec - tv1.tv_sec - 1) * 1000000.0; printf("cksum %d read %qd bytes in %4.3f seconds, %4.3f MB/sec cpu %4.3f sec\n", cksum, /* so compiler does not optimize it out */ st.st_size, usec / 1000000.0, (double)st.st_size / (usec * 1024.0 * 1024.0 / 1000000.0), ((ru.ru_utime.tv_usec + ru.ru_stime.tv_usec) + (ru.ru_utime.tv_sec + ru.ru_stime.tv_sec) * 1.0E6) / 1.0E6 ); } return(0); } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message