Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Feb 1995 16:47:16 +1100
From:      Bruce Evans <bde@zeta.org.au>
To:        bde@zeta.org.au, terry@cs.weber.edu
Cc:        hackers@freefall.cdrom.com, jkh@freefall.cdrom.com, roberto@blaise.ibp.fr
Subject:   Re: Optimizing CVS?
Message-ID:  <199502020547.QAA02136@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>> Actual testing shows that stat() is slightly faster under
>> FreeBSD-current than under linux-1.1.36 for files in a medium sized
>> ...

>This is a bogus benchmark; you are hitting pages that are cached by
>virtue of the number of reps applied and the operation order.

Somewhat bogus.  I'm _trying_ to test what happens for things in
cache(s).  If the disk has to be accessed, then stat() will be 10-100
times slower.  I'm not trying to understand why the disk has to be
accessed (yet).  Profiling shows if the disk is accessed a lot (it
isn't).  I didn't include a profile for the test in my previous mail.
The enclosed profile is for a slightly modified test without the
creat()s and with the stat() loop repeated 100 times for 100000 stat()s
altogether.

>This does not show the speed loss is not in the stat in practice.

The profile does.  It also shows that everything is cached.

>In effect, you are testing system call + copyout time rather than system
>call + fetch + copyout time.

Nope.  The syscall + copyout time are relatively small.  The leaf
routines that take the longest time are bcmp() (presumably for caching
- 48 bcmp's per stat(), and our kernel bcmp() is the slow C version!)
and user() (this is for all user time; presumably it's mostly for
printf).

>This is the same type of reasoning that make Larry McVoy's lmbench
>mmap benchmark (map-but-don't-use) bogus.

Even bogus benchmarks can give interesting results if they are
profiled.  Note that my profiling is non-statistical and should be
fairly accurate except possibly for very fast leaf routines
(< 10 usec).

Bruce

---
granularity: each sample hit covers 4 byte(s) for 0.00% of 64.21 seconds

                                  called/total       parents 
index  %time    self descendents  called+self    name    	index
                                  called/total       children
...
-----------------------------------------------

                1.18       53.20  102299/102299      _Xsyscall [2]
[3]     84.7    1.18       53.20  102299         _syscall [3]
                0.50       51.77  100011/100011      _stat [4]
                0.37        0.00  102253/103616      _copyin [53]
                0.02        0.13    2584/3396        _mi_switch [62]
...
-----------------------------------------------

                0.50       51.77  100011/100011      _syscall [3]
[4]     81.4    0.50       51.77  100011         _stat [4]
                0.87       44.88  100011/100043      _namei [5]
                1.65        2.62  100009/100020      _vn_stat [15]
                0.94        0.00  100009/100766      _copyout [38]
                0.05        0.76  100009/100053      _vput [41]
...
granularity: each sample hit covers 4 byte(s) for 0.00% of 208.85 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 41.5     86.635   86.635                             _cputime (842)
 17.6    123.321   36.686                             _mcount (1514)
 10.1    144.476   21.155                             _mexitcount (1516)
  5.0    154.958   10.481  4809131        2        2  _bcmp [9]
  3.2    161.734    6.776                             _user [11]
  2.1    166.034    4.300   200086       21      193  _ufs_lookup [7]
  1.0    168.037    2.003   100043       20      420  _lookup [6]
  0.9    170.001    1.964    68910       28      108  _getnewvnode [10]
  0.9    171.868    1.867   100049       19      144  _ffs_vget [8]
  0.8    173.520    1.652   100020       17       43  _vn_stat [15]
  0.8    175.126    1.606   169239        9       10  _brelse [25]
  0.8    176.702    1.576   100054       16       16  _ufs_getattr [26]
  0.7    178.206    1.504   169234        9        9  _malloc <cycle 1> [27]
  0.7    179.646    1.440   200106        7        7  ___qdivrem [28]
  0.7    181.050    1.404   200086        7       36  _cache_lookup [12]
  0.6    182.336    1.286   540599        2        2  _ufs_unlock [31]
  0.6    183.580    1.244   169229        7        7  _free <cycle 1> [32]
  0.6    184.755    1.176   102299       11      531  _syscall [3]
  0.5    185.871    1.115   302736        4        4  _ufs_lock [34]
  0.5    186.864    0.993   369120        3        5  _vrele [22]
  0.5    187.809    0.944   100766        9        9  _copyout [38]
  0.4    188.736    0.928   169332        5        5  _incore [39]
  0.4    189.652    0.916    68961       13       17  _ffs_fsync [33]
  0.4    190.520    0.868   100043        9      457  _namei [5]
  0.4    191.344    0.823   100048        8       14  _cache_enter [29]
  0.4    192.166    0.823   169214        5       14  _getblk [19]
  0.4    192.921    0.754   100147        8        8  _copyinstr [42]
  0.4    193.669    0.749   100249        7       27  _ffs_blkatoff [18]
  0.4    194.417    0.748   200116        4        4  _ufs_access [43]
  0.3    195.107    0.691    68908       10       27  _vinvalbuf [23]
  0.3    195.797    0.689    71070       10       10  _bzero [46]
  0.3    196.473    0.677   169195        4       19  _bread [17]
  0.3    197.122    0.648   200094        3       10  ___udivdi3 [21]
  0.3    197.761    0.639   169239        4        4  _bremfree [47]
  0.3    198.356    0.596   102817        6        6  _bcopy [48]
  0.3    198.941    0.585    68908        8       75  _vgone [13]
  0.3    199.505    0.564    68910        8        8  _ufs_vinit [49]
  0.3    200.061    0.555    68908        8       67  _vclean [14]
  0.3    200.607    0.547   168953        3        6  _ufs_inactive [37]
  0.2    201.128    0.521   100049        5        7  _ufs_ihashget [44]
  0.2    201.628    0.499   100011        5      523  _stat [4]
  0.2    202.096    0.469    43163       11       21  _hardclock [40]
  0.2    202.562    0.466   131204        4        7  _vget [36]
  0.2    202.940    0.378    68910        5        5  _ufs_ihashins [52]
  0.2    203.317    0.376   174015        2        3  _doreti [50]
  0.2    203.688    0.371   103616        4        4  _copyin [53]
...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199502020547.QAA02136>