Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Nov 1998 21:29:20 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Greg Lehey <grog@lemis.com>
Cc:        "Justin T. Gibbs" <gibbs@narnia.plutotech.com>, hackers@FreeBSD.ORG
Subject:   Re: SCSI vs. DMA33..
Message-ID:  <199811130529.VAA01053@apollo.backplane.com>
References:  <98Nov11.134648jst.21907@ns.isi.co.jp> <199811111538.IAA00103@narnia.plutotech.com> <19981112182238.J463@freebie.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
:>> earlier today.
:>
:> You had all of 1 command going to each disk.  That doesn't give
:> you any per-device overlap.
:
:Sure.  I was referring to the command overhead, not the effect of
:overlapped commands.
:
:> If you really want to see the effect of overlapped commands, run a
:> benchmark through the filesystem that causes lots of commands to be
:...
:
:No doubt, and that's what I intended to do next.  Unfortunately, I've
:just fallen off the net (massive phone cable damage out in the
:street), so I don't can't download any benchmarks.
:
:Greg

    I think where overlapping commands have their greatest
    advantage is when a command to a drive has a certain
    latency associated with it that prevents you from issuing
    another command to the drive that might be completed more
    efficiently by the drive, but I've done rather extensive
    tests in the last few days and I don't think tags have
    as great an effect on performance as disconnection does
    (i.e. being able to run commands to several drives 
    simultaniously).

    Here are some SCSI command latency / bandwidth tests.  Starting out 
    with a real small block size :-), observe both the drive transfer rate,
    number of transfers/sec, and the cpu used/idle time.

dd if=/dev/rsd1 of=/dev/null bs=1k count=65536
67108864 bytes transferred in 8.598441 secs (7804771 bytes/sec)

      tty          sd0           sd1           sd2           sd3          cpu
 tin tout sps tps msps  sps tps msps  sps tps msps  sps tps msps  us ni sy in id
   0   82   0   0  0.0 153547677  0.0    0   0  0.0    0   0  0.0   4  0 29 16 50
   0   82   0   0  0.0 153407670  0.0    0   0  0.0    0   0  0.0   3  0 33 12 53
   0   82   0   0  0.0 152607630  0.0    0   0  0.0    0   0  0.0   3  0 27 12 58
    7600 SCSI transactions/sec, 7.8 MBytes/sec.

    I believe this is a drive firmware transaction rate limitation.  ( platter 
    saturation is 11 MBytes/sec w/ this drive).

***

    But when you bump the block size up to 4K, the drive has no problem 
    hitting the platter transfer rate of 11 Mbytes/sec, even though it 
    is doing over 2700 SCSI transactions/sec.

dd if=/dev/rsd1 of=/dev/null bs=4k count=16384
67108864 bytes transferred in 6.035871 secs (11118340 bytes/sec)

   0   81   0   0  0.0 216852711  0.0    0   0  0.0    0   0  0.0   1  0 12  5 82
   0   82   0   0  0.0 218372730  0.0    0   0  0.0    0   0  0.0   3  0  9  2 85
   0   82   0   0  0.0 218142727  0.0    0   0  0.0    0   0  0.0   0  0 12  5 82

    2700 SCSI transactions/sec, 11 MBytes/sec.    The cpu is still somewhat
    loaded by the large number of transactions.

***

    More interesting things happen when you run commands to multiple
    SCSI drives simultaniously.

dd if=/dev/rsd1 of=/dev/null bs=512 &
dd if=/dev/rsd2 of=/dev/null bs=512 &
dd if=/dev/rsd3 of=/dev/null bs=512 &
dd if=/dev/rsd4 of=/dev/null bs=512 &

backup1:/tmp/ssh-dillon# iostat sd1 sd2 sd3 sd4 1
      tty          sd1           sd2           sd3           sd4          cpu
 tin tout sps tps msps  sps tps msps  sps tps msps  sps tps msps  us ni sy in id
   0   7541024081  0.0 40123993  0.0 40093988  0.0 30213001  0.0   9  0 68 19  5
   0   7937913802  0.0 37823793  0.0 37623784  0.0 36523664  0.0   7  0 52 32  9

    The drives are only doing 1.8 MBytes/sec each, doing 3800 
    transactions/sec each.  The SCSI bus as a whole is doing 15200
    transactions/sec.  The cpu idle time is 5%, so we've obviously hit a 
    *CPU* limitation on the motherboard here.  I don't think we've hit a SCSI
    bus limitation here.

    Fortunately, this situation never occurs in real life.

**

    Lets try an 8K block size.

dd if=/dev/rsd1 of=/dev/null bs=8k &
dd if=/dev/rsd2 of=/dev/null bs=8k &
dd if=/dev/rsd3 of=/dev/null bs=8k &
dd if=/dev/rsd4 of=/dev/null bs=8k &

backup1:/tmp/ssh-dillon# iostat sd1 sd2 sd3 sd4 1
      tty          sd1           sd2           sd3           sd4          cpu
 tin tout sps tps msps  sps tps msps  sps tps msps  sps tps msps  us ni sy in id
   0   8514860 929  0.0 14876 930  0.0 14876 930  0.0 14860 929  0.0   2  0 15 12 72
   0   8514908 932  0.0 14923 933  0.0 14892 931  0.0 14923 933  0.0   1  0 16  9 74

    Here we are doing 7.5 MBytes/sec or so per drive (30 MBytes/sec for the 
    whole SCSI bus).  The transaction rate doing 8K reads is 
    930 transactions/sec per drive (3720 for the SCSI bus as a whole).

**

    If I use a larger block size... 16K, 32K, 64K, and so forth, the transfer
    rate approachs 34 MBytes/sec (9.5 MBytes/sec per drive).  None of the 
    drives are able to get all the way up to 11 Mbytes/sec/drive when all 
    four are transfering to the SCSI bus at the same time, so obviously we 
    are hitting a selection limitation of some sort in either the drive or
    the controller firmware.  I've hit higher aggregate bandwidths in the 
    past (36 MBytes/sec or higher), I'm not sure why I couldn't this time. 
    Could be the drives.

    In anycase, looking at the random-seeking case, where SCSI bus bandwidth
    is not an issue (drives cannot get anywhere near their platter bandwidth
    when seeking randomly :-)), it seems pretty clear to me that a SCSI bus
    should be able to handle upwards of 3700 SCSI transactions/sec without
    any significant resource degredation other then cpu (25% utilization w/ a
    PPro 200).

    So for the sake of argument, lets say, oh, 2500 SCSI transactions/sec is
    the most we are willing to perform.  

    A randomly-seeking drive as occurs in a web server or news server can
    handle 150 transactions/sec at best.

    2500 / 150 = 16.  So one should be able to put 15 SCSI drives on a SCSI 
    bus in a web server / news reader / other situation without saturating
    any significant resources yet still be able to run the drives at full
    speed (for the randomly seeking case).

    For reference, my NNTP box runs between 1 and 2 MBytes/sec worth of 
    bandwidth under normal load.  15 drives @ 2 MBytes/sec is 30 Mbytes/sec, 
    which I've shown to be easily achievable in the tests above.  So the
    SCSI concurrency here is very significant... you really can throw 15
    drives onto a SCSI bus, though realistically you may not be able to 
    extend an ultra-wide cable to that many.

    I think that IDE is fine as long as you only put in one drive per 
    IDE controller, at least in a randomly seeking situation.   In a linear
    reading situation the drive caching has (through previous postings made
    by others) been shown to scale to two drives on an IDE bus.  Even saying
    that, I would never personally use IDE in a production commercial system
    unless the disk were irrelevant, like in a recursive DNS server or a 
    radius server.

    But I would like to interject one last point here.... on all of our shell
    machines, web servers, news machines, the resource we run out of *first*
    are disk suds.  It makes little sense to me, if you have two drives,
    to intentionally halve performance by putting them on one IDE bus, or if 
    you have four drives to halve performance by putting two on each of two 
    IDE busses. 

    If one only has two drives, then putting one on each IDE bus should yield
    acceptable results, but beyond that a person is seriously limiting the 
    platform's performance.  I would go as far as to say 'gee, if we are
    going to put two drives on the same IDE bus we might as well throw away
    the 400 MHz P-II motherboard and get something cheaper'.

    I try to balance resources such that they all start to run out at the
    same time, otherwise I'm wasting money on something :-).  The best
    example I have of a balanced machine is shell5.ba.best.com.  4 SCSI
    drives (3 used for web serving / home directories), 512MB ram, P-II/300.
    It's history has gone something like this:

	(root drive +)

	1x9G drives	128MB ram	PPro-200
	2x9G drives	256MB ram	PPro-200
	3x9G drives	384MB ram	P-II/300 (note 1)
	[ 4x9G drives	512MB ram	P-II/300 ]	FUTURE

	note (1): new motherboard to accomodate more memory, cpu performance
	is about the same.


						    -Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811130529.VAA01053>