From owner-freebsd-hackers Wed Feb 1 10:22:51 1995 Return-Path: hackers-owner Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id KAA00486 for hackers-outgoing; Wed, 1 Feb 1995 10:22:51 -0800 Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.34]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id KAA00452; Wed, 1 Feb 1995 10:22:40 -0800 Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id DAA17258; Thu, 2 Feb 1995 03:25:46 +1100 Date: Thu, 2 Feb 1995 03:25:46 +1100 From: Bruce Evans Message-Id: <199502011625.DAA17258@godzilla.zeta.org.au> To: jkh@freefall.cdrom.com, roberto@blaise.ibp.fr Subject: Re: Optimizing CVS? Cc: hackers@freefall.cdrom.com Sender: hackers-owner@FreeBSD.org Precedence: bulk >Maybe we should optimize stat(2) before... Ask Remy about the tests >he made between ext2fs and our ffs. Our stat(2) is way slower than the >1.1.5.1 one. Actual testing shows that stat() is slightly faster under FreeBSD-current than under linux-1.1.36 for files in a medium sized directory, and 5 times faster under FreeBSD for files in a large directory! FreeBSD is much slower at creating the files to be stat()ed because it updates metadata synchronously. Profiling cvs doesn't show much in-core file system activity. I think it's just spending too much time waiting to write metadata and temporary files to the disk. This is nothing new, so it's not clear why 1.1.5 was faster. Bruce --- All tests were run on a 486DX/33 IDE system. Output was generated using this shell script: ---- for i in 1 10 100 1000 do ./z $i done for i in 1 10 100 1000 do ./z $i done time rm -rf stat-test --- on this test program: --- z.c --- #include #include #include #include #include #include #include #include #include #include #include static void error(char *fmt, ...) { va_list ap; int save_errno; save_errno = errno; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); fprintf(stderr, ": %s\n", strerror(save_errno)); exit(1); } #ifdef GETTIMEOFDAY_MISSING struct timezone; static int gettimeofday(struct timeval *tv, struct timezone *tz) { double c; struct tms tms; times(&tms); c = (double) (tms.tms_utime + tms.tms_stime) / CLOCKS_PER_SEC; tv->tv_sec = c; tv->tv_usec = (c - tv->tv_sec) * 1e6; return 0; } #endif int main(int argc, char **argv) { int fd; double dt; struct timeval finish; int i; int n; char nb[64]; struct stat sb; struct timeval start; n = atoi(argv[1]); mkdir("stat-test", 0755); if (gettimeofday(&start, (struct timezone *) NULL) != 0) error("gettimeofday for start of creat tests failed"); for (i = 0; i < n; ++i) { sprintf(nb, "stat-test/%d", i); fd = creat(nb, 0666); if (fd < 0) error("creat of %s failed (step %d)", nb, i); if (close(fd) != 0) error("close of %s failed (step %d)", nb, i); } if (gettimeofday(&finish, (struct timezone *) NULL) != 0) error("gettimeofday for finish of creat tests failed"); dt = (finish.tv_sec - start.tv_sec) + (finish.tv_usec - start.tv_usec) / 1e6; printf("time for %d creats = %.6f (%.6f each)\n", n, dt, dt / n); if (gettimeofday(&start, (struct timezone *) NULL) != 0) error("gettimeofday for start of stat tests failed"); for (i = 0; i < n; ++i) { sprintf(nb, "stat-test/%d", i); if (stat(nb, &sb) != 0) error("stat of %s failed (step %d)", nb, i); } if (gettimeofday(&finish, (struct timezone *) NULL) != 0) error("gettimeofday for finish of stat tests failed"); dt = (finish.tv_sec - start.tv_sec) + (finish.tv_usec - start.tv_usec) / 1e6; printf("time for %d stats = %.6f (%.6f each)\n", n, dt, dt / n); return 0; } --- # minix file system under linux-1.1.36 Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hda3 49525 43988 5537 89% /minix1 Filesystem Inodes IUsed IFree %IUsed Mounted on /dev/hda3 12288 6943 5345 57% /minix1 time for 1 creats = 0.002140 (0.002140 each) time for 1 stats = 0.000344 (0.000344 each) time for 10 creats = 0.007555 (0.000755 each) time for 10 stats = 0.004280 (0.000428 each) time for 100 creats = 0.099810 (0.000998 each) time for 100 stats = 0.062747 (0.000627 each) time for 1000 creats = 4.484905 (0.004485 each) time for 1000 stats = 2.569069 (0.002569 each) time for 1 creats = 0.002169 (0.002169 each) time for 1 stats = 0.000347 (0.000347 each) time for 10 creats = 0.007130 (0.000713 each) time for 10 stats = 0.003092 (0.000309 each) time for 100 creats = 0.076044 (0.000760 each) time for 100 stats = 0.064099 (0.000641 each) time for 1000 creats = 3.087616 (0.003088 each) time for 1000 stats = 2.619802 (0.002620 each) 0.03user 3.79system 0:03.82elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+0minor)pagefaults 0swaps --- # ext2 file system under linux-1.1.36 Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hda4 120640 71274 43334 62% /usr Filesystem Inodes IUsed IFree %IUsed Mounted on /dev/hda4 30240 5148 25092 17% /usr time for 1 creats = 0.002205 (0.002205 each) time for 1 stats = 0.000415 (0.000415 each) time for 10 creats = 0.010211 (0.001021 each) time for 10 stats = 0.005585 (0.000558 each) time for 100 creats = 0.157388 (0.001574 each) time for 100 stats = 0.082506 (0.000825 each) time for 1000 creats = 7.780158 (0.007780 each) time for 1000 stats = 3.712792 (0.003713 each) time for 1 creats = 0.002524 (0.002524 each) time for 1 stats = 0.000581 (0.000581 each) time for 10 creats = 0.010363 (0.001036 each) time for 10 stats = 0.005613 (0.000561 each) time for 100 creats = 0.116314 (0.001163 each) time for 100 stats = 0.099282 (0.000993 each) time for 1000 creats = 3.947709 (0.003948 each) time for 1000 stats = 3.751310 (0.003751 each) 0.06user 1.45system 0:01.53elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+0minor)pagefaults 0swaps --- # (same) minix file system under minix-1.6.25.mine Device Inodes Inodes Inodes Blocks Blocks Blocks Mounted V Pr total used free total used free on ------ ------ ------ ------ ------ ------ ------- - -- /dev/hda3 12288 6948 5340 49920 44437 5483 / 1 rw time for 1 creats = 0.000000 (0.000000 each) time for 1 stats = 0.000000 (0.000000 each) time for 10 creats = 0.000000 (0.000000 each) time for 10 stats = 0.016667 (0.001667 each) time for 100 creats = 0.133334 (0.001333 each) time for 100 stats = 0.033333 (0.000333 each) time for 1000 creats = 5.433333 (0.005433 each) time for 1000 stats = 0.516667 (0.000517 each) time for 1 creats = 0.000000 (0.000000 each) time for 1 stats = 0.000000 (0.000000 each) time for 10 creats = 0.000000 (0.000000 each) time for 10 stats = 0.016667 (0.001667 each) time for 100 creats = 0.066667 (0.000667 each) time for 100 stats = 0.050000 (0.000500 each) time for 1000 creats = 0.783333 (0.000783 each) time for 1000 stats = 0.516667 (0.000517 each) 2.00 real 0.11 user 1.98 sys --- # ufs file system under FreeBSD 2.1.0-Development #597: Tue Jan 31 19:40:24 EST 1995 time for 1 creats = 0.050128 (0.050128 each) time for 1 stats = 0.000539 (0.000539 each) time for 10 creats = 0.289739 (0.028974 each) time for 10 stats = 0.004222 (0.000422 each) time for 100 creats = 3.376021 (0.033760 each) time for 100 stats = 0.047203 (0.000472 each) time for 1000 creats = 42.747052 (0.042747 each) time for 1000 stats = 0.714697 (0.000715 each) time for 1 creats = 0.001992 (0.001992 each) time for 1 stats = 0.000410 (0.000410 each) time for 10 creats = 0.010583 (0.001058 each) time for 10 stats = 0.003012 (0.000301 each) time for 100 creats = 0.122799 (0.001228 each) time for 100 stats = 0.029209 (0.000292 each) time for 1000 creats = 3.959052 (0.003959 each) time for 1000 stats = 0.738038 (0.000738 each) 16.84 real 0.05 user 1.35 sys --- This profile output is for `cvs -Q bin' of a new `bin' on a 486DX2/66 SCSI system. I've made some local improvements to profiling. The times for _cputime, _mcount and _mexitcount are profiling overheads. When the checkout is run with profiling off, it is faster by almost exactly the profiling overhead, so the overhead is apparently not overlapped with waiting for the disk. --- granularity: each sample hit covers 4 byte(s) for 0.00% of 164.53 seconds % cumulative self self total time seconds seconds calls us/call us/call name 35.9 59.016 59.016 3704 15933 15933 _idle [5] 21.6 94.557 35.541 _cputime [7] 9.0 109.408 14.851 _mcount (1470) 8.2 122.857 13.449 _user [13] 5.2 131.447 8.590 _mexitcount [24] 2.0 134.810 3.363 10108 333 333 _pmap_copy_page [44] 1.1 136.660 1.850 68026 27 30 _pmap_page_protect [60] 1.1 138.399 1.738 42968 40 40 _bzero [65] 0.8 139.674 1.275 245493 5 5 _vm_page_lookup [75] 0.6 140.587 0.914 59681 15 250 _vm_fault [14] 0.5 141.357 0.769 60541 13 13 _pmap_enter [94] 0.4 141.990 0.633 176799 4 4 _vm_page_remove [105] 0.4 142.598 0.609 179817 3 3 _vm_page_unqueue [109] 0.3 143.108 0.510 35449 14 165 _ufs_lookup [33] 0.3 143.617 0.509 176832 3 3 _vm_page_insert [123] 0.3 144.121 0.504 48614 10 1328 _syscall [3] 0.3 144.598 0.477 61202 8 364 _trap [10] 0.3 145.060 0.461 65641 7 21 _vm_map_lookup [71] 0.3 145.504 0.445 33969 13 26 _pmap_remove [85] 0.3 145.939 0.435 44544 10 10 _malloc [135] 0.3 146.374 0.435 115642 4 4 _wakeup [129] ---