From owner-freebsd-hackers Wed Feb 1 21:49:22 1995 Return-Path: hackers-owner Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id VAA16887 for hackers-outgoing; Wed, 1 Feb 1995 21:49:22 -0800 Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.34]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id VAA16866; Wed, 1 Feb 1995 21:49:02 -0800 Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id QAA02136; Thu, 2 Feb 1995 16:47:16 +1100 Date: Thu, 2 Feb 1995 16:47:16 +1100 From: Bruce Evans Message-Id: <199502020547.QAA02136@godzilla.zeta.org.au> To: bde@zeta.org.au, terry@cs.weber.edu Subject: Re: Optimizing CVS? Cc: hackers@freefall.cdrom.com, jkh@freefall.cdrom.com, roberto@blaise.ibp.fr Sender: hackers-owner@FreeBSD.org Precedence: bulk >> Actual testing shows that stat() is slightly faster under >> FreeBSD-current than under linux-1.1.36 for files in a medium sized >> ... >This is a bogus benchmark; you are hitting pages that are cached by >virtue of the number of reps applied and the operation order. Somewhat bogus. I'm _trying_ to test what happens for things in cache(s). If the disk has to be accessed, then stat() will be 10-100 times slower. I'm not trying to understand why the disk has to be accessed (yet). Profiling shows if the disk is accessed a lot (it isn't). I didn't include a profile for the test in my previous mail. The enclosed profile is for a slightly modified test without the creat()s and with the stat() loop repeated 100 times for 100000 stat()s altogether. >This does not show the speed loss is not in the stat in practice. The profile does. It also shows that everything is cached. >In effect, you are testing system call + copyout time rather than system >call + fetch + copyout time. Nope. The syscall + copyout time are relatively small. The leaf routines that take the longest time are bcmp() (presumably for caching - 48 bcmp's per stat(), and our kernel bcmp() is the slow C version!) and user() (this is for all user time; presumably it's mostly for printf). >This is the same type of reasoning that make Larry McVoy's lmbench >mmap benchmark (map-but-don't-use) bogus. Even bogus benchmarks can give interesting results if they are profiled. Note that my profiling is non-statistical and should be fairly accurate except possibly for very fast leaf routines (< 10 usec). Bruce --- granularity: each sample hit covers 4 byte(s) for 0.00% of 64.21 seconds called/total parents index %time self descendents called+self name index called/total children ... ----------------------------------------------- 1.18 53.20 102299/102299 _Xsyscall [2] [3] 84.7 1.18 53.20 102299 _syscall [3] 0.50 51.77 100011/100011 _stat [4] 0.37 0.00 102253/103616 _copyin [53] 0.02 0.13 2584/3396 _mi_switch [62] ... ----------------------------------------------- 0.50 51.77 100011/100011 _syscall [3] [4] 81.4 0.50 51.77 100011 _stat [4] 0.87 44.88 100011/100043 _namei [5] 1.65 2.62 100009/100020 _vn_stat [15] 0.94 0.00 100009/100766 _copyout [38] 0.05 0.76 100009/100053 _vput [41] ... granularity: each sample hit covers 4 byte(s) for 0.00% of 208.85 seconds % cumulative self self total time seconds seconds calls us/call us/call name 41.5 86.635 86.635 _cputime (842) 17.6 123.321 36.686 _mcount (1514) 10.1 144.476 21.155 _mexitcount (1516) 5.0 154.958 10.481 4809131 2 2 _bcmp [9] 3.2 161.734 6.776 _user [11] 2.1 166.034 4.300 200086 21 193 _ufs_lookup [7] 1.0 168.037 2.003 100043 20 420 _lookup [6] 0.9 170.001 1.964 68910 28 108 _getnewvnode [10] 0.9 171.868 1.867 100049 19 144 _ffs_vget [8] 0.8 173.520 1.652 100020 17 43 _vn_stat [15] 0.8 175.126 1.606 169239 9 10 _brelse [25] 0.8 176.702 1.576 100054 16 16 _ufs_getattr [26] 0.7 178.206 1.504 169234 9 9 _malloc [27] 0.7 179.646 1.440 200106 7 7 ___qdivrem [28] 0.7 181.050 1.404 200086 7 36 _cache_lookup [12] 0.6 182.336 1.286 540599 2 2 _ufs_unlock [31] 0.6 183.580 1.244 169229 7 7 _free [32] 0.6 184.755 1.176 102299 11 531 _syscall [3] 0.5 185.871 1.115 302736 4 4 _ufs_lock [34] 0.5 186.864 0.993 369120 3 5 _vrele [22] 0.5 187.809 0.944 100766 9 9 _copyout [38] 0.4 188.736 0.928 169332 5 5 _incore [39] 0.4 189.652 0.916 68961 13 17 _ffs_fsync [33] 0.4 190.520 0.868 100043 9 457 _namei [5] 0.4 191.344 0.823 100048 8 14 _cache_enter [29] 0.4 192.166 0.823 169214 5 14 _getblk [19] 0.4 192.921 0.754 100147 8 8 _copyinstr [42] 0.4 193.669 0.749 100249 7 27 _ffs_blkatoff [18] 0.4 194.417 0.748 200116 4 4 _ufs_access [43] 0.3 195.107 0.691 68908 10 27 _vinvalbuf [23] 0.3 195.797 0.689 71070 10 10 _bzero [46] 0.3 196.473 0.677 169195 4 19 _bread [17] 0.3 197.122 0.648 200094 3 10 ___udivdi3 [21] 0.3 197.761 0.639 169239 4 4 _bremfree [47] 0.3 198.356 0.596 102817 6 6 _bcopy [48] 0.3 198.941 0.585 68908 8 75 _vgone [13] 0.3 199.505 0.564 68910 8 8 _ufs_vinit [49] 0.3 200.061 0.555 68908 8 67 _vclean [14] 0.3 200.607 0.547 168953 3 6 _ufs_inactive [37] 0.2 201.128 0.521 100049 5 7 _ufs_ihashget [44] 0.2 201.628 0.499 100011 5 523 _stat [4] 0.2 202.096 0.469 43163 11 21 _hardclock [40] 0.2 202.562 0.466 131204 4 7 _vget [36] 0.2 202.940 0.378 68910 5 5 _ufs_ihashins [52] 0.2 203.317 0.376 174015 2 3 _doreti [50] 0.2 203.688 0.371 103616 4 4 _copyin [53] ...