From owner-freebsd-hackers  Tue May  7 21:43:01 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id VAA22147
          for hackers-outgoing; Tue, 7 May 1996 21:43:01 -0700 (PDT)
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id VAA22138
          Tue, 7 May 1996 21:42:54 -0700 (PDT)
Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id OAA23473; Wed, 8 May 1996 14:39:40 +1000
Date: Wed, 8 May 1996 14:39:40 +1000
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199605080439.OAA23473@godzilla.zeta.org.au>
To: koshy@india.hp.com, stesin@elvisti.kiev.ua
Subject: Re: lmbench IDE anomaly
Cc: current@freebsd.org, hackers@freebsd.org
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>i.e# ./lmdd if=/dev/DEVICE bs=BLOCKSIZE count=16MEG/BLOCKSIZE of=internal

>Throughput for one lmdd reader process and two simultaneous lmdd readers are 
>given below).

>                        Per process                     Per process
>Device  blocksize       KB/s            blocksize       KB/s
>~~~~~~  ~~~~~~~~~       ~~~~            ~~~~~~~~~       ~~~~
>        -- SCSI DISK --
>        --single reader--
>rsd0a   bs=1024         653.74          bs=8192         1312.31
>                        682.66                          1268.36
>                        677.52                          1361.95


>        --two readers--
>rsd0a   bs=1024         424.27          bs=8192         805.69
>                        424.24                          807.64
>                        --                              812.82

>Looks like changing the block size for the read can double throughput.

This is normal for small block sizes on SCSI disks.  On my P133 ncr'810
system with a slow Toshiba MK537FB drive (which BTW still breaks everything
unless SCSI_NCR_DFLT_TAGS is defined as 0 using option FAILSAFE or directly
(it breaks things slightly more in -current than in 2.1R if this option
isn't used)), the speeds for a single process are:

	blocksize	KB/s
	---------	----
	  512		 180
	 1024		 334
	 2048		 678
	 4096		1139
	 8192		2034
	16384		2480
	32768		2544
	65536		2528

I.e., for block sizes smaller than 8K, the speed is approximately
proportional to the block size (because SCSI command overhead doesn't
depend much on the block size and is very large).

>Also, two readers yield better thoughput than a single reader process.

Perhaps this is because there is some overlap for the command overheads.
The command for process 2 will usually arrive while the i/o for process 1
is in progress, so the drive may be able to process most of it before it
can be executed.

>        -- IDE DISK --
>        --single reader--
>rwd0a   bs=1024         839.05          bs=8192         2392.08 (!!)
>                        841.53                          2402.42 (!!)
>                        841.85                          2402.45 (!!)

I'm surprised that the larger block size is so much faster.  The command
overhead is much lower for IDE.

>        --two readers--
>rwd0a   bs=1024         199.38          bs=8192         251.83
>                        218.38                          237.95
>                        220.68                          238.50

>The read rates for the single reader case are fantastic, however
>disaster seems to strike when two reader access the same device

There is no possibility for overlapping of command overheads because
commands are serialized in the driver.

I think the slowdown is to be expected.  The large command overhead
for slow drives like your SCSI drive and my Toshiba probably results
in each process taking turns reading the same block out of the drive's
cache.  OTOH, for faster drives like my Quantum XPG and any IDE drive,
one of the processes apparently gets far enough ahead of the other
to defeat the drive's caching.  This causes a 26x per-process slowdown
for the XPG.

>So I looked at the block device.

>        --single reader--
>wd0a    bs=1024         199.80          bs=8192         796.07
>                        200.04                          795.06

>        --two readers--
>wd0a    bs=1024         200.04          bs=8192         795.60
>                        200.33                          795.20

>Hmm, block size makes a huge difference still.  Is this to 
>be expected?  Also the two reader case and the single reader case

It's a bit unexpected.  A (too-small) block size of 2048 is always
used for physical reads for the block device.  Thus the speed is
limited to that of the raw device with a block size of 2048.  The
(lack of) speed of my Toshiba /dev/sd0 is almost independent of
the block size.

Bruce