Date: Thu, 12 Nov 1998 21:29:20 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Greg Lehey <grog@lemis.com> Cc: "Justin T. Gibbs" <gibbs@narnia.plutotech.com>, hackers@FreeBSD.ORG Subject: Re: SCSI vs. DMA33.. Message-ID: <199811130529.VAA01053@apollo.backplane.com> References: <98Nov11.134648jst.21907@ns.isi.co.jp> <199811111538.IAA00103@narnia.plutotech.com> <19981112182238.J463@freebie.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:>> earlier today. :> :> You had all of 1 command going to each disk. That doesn't give :> you any per-device overlap. : :Sure. I was referring to the command overhead, not the effect of :overlapped commands. : :> If you really want to see the effect of overlapped commands, run a :> benchmark through the filesystem that causes lots of commands to be :... : :No doubt, and that's what I intended to do next. Unfortunately, I've :just fallen off the net (massive phone cable damage out in the :street), so I don't can't download any benchmarks. : :Greg I think where overlapping commands have their greatest advantage is when a command to a drive has a certain latency associated with it that prevents you from issuing another command to the drive that might be completed more efficiently by the drive, but I've done rather extensive tests in the last few days and I don't think tags have as great an effect on performance as disconnection does (i.e. being able to run commands to several drives simultaniously). Here are some SCSI command latency / bandwidth tests. Starting out with a real small block size :-), observe both the drive transfer rate, number of transfers/sec, and the cpu used/idle time. dd if=/dev/rsd1 of=/dev/null bs=1k count=65536 67108864 bytes transferred in 8.598441 secs (7804771 bytes/sec) tty sd0 sd1 sd2 sd3 cpu tin tout sps tps msps sps tps msps sps tps msps sps tps msps us ni sy in id 0 82 0 0 0.0 153547677 0.0 0 0 0.0 0 0 0.0 4 0 29 16 50 0 82 0 0 0.0 153407670 0.0 0 0 0.0 0 0 0.0 3 0 33 12 53 0 82 0 0 0.0 152607630 0.0 0 0 0.0 0 0 0.0 3 0 27 12 58 7600 SCSI transactions/sec, 7.8 MBytes/sec. I believe this is a drive firmware transaction rate limitation. ( platter saturation is 11 MBytes/sec w/ this drive). *** But when you bump the block size up to 4K, the drive has no problem hitting the platter transfer rate of 11 Mbytes/sec, even though it is doing over 2700 SCSI transactions/sec. dd if=/dev/rsd1 of=/dev/null bs=4k count=16384 67108864 bytes transferred in 6.035871 secs (11118340 bytes/sec) 0 81 0 0 0.0 216852711 0.0 0 0 0.0 0 0 0.0 1 0 12 5 82 0 82 0 0 0.0 218372730 0.0 0 0 0.0 0 0 0.0 3 0 9 2 85 0 82 0 0 0.0 218142727 0.0 0 0 0.0 0 0 0.0 0 0 12 5 82 2700 SCSI transactions/sec, 11 MBytes/sec. The cpu is still somewhat loaded by the large number of transactions. *** More interesting things happen when you run commands to multiple SCSI drives simultaniously. dd if=/dev/rsd1 of=/dev/null bs=512 & dd if=/dev/rsd2 of=/dev/null bs=512 & dd if=/dev/rsd3 of=/dev/null bs=512 & dd if=/dev/rsd4 of=/dev/null bs=512 & backup1:/tmp/ssh-dillon# iostat sd1 sd2 sd3 sd4 1 tty sd1 sd2 sd3 sd4 cpu tin tout sps tps msps sps tps msps sps tps msps sps tps msps us ni sy in id 0 7541024081 0.0 40123993 0.0 40093988 0.0 30213001 0.0 9 0 68 19 5 0 7937913802 0.0 37823793 0.0 37623784 0.0 36523664 0.0 7 0 52 32 9 The drives are only doing 1.8 MBytes/sec each, doing 3800 transactions/sec each. The SCSI bus as a whole is doing 15200 transactions/sec. The cpu idle time is 5%, so we've obviously hit a *CPU* limitation on the motherboard here. I don't think we've hit a SCSI bus limitation here. Fortunately, this situation never occurs in real life. ** Lets try an 8K block size. dd if=/dev/rsd1 of=/dev/null bs=8k & dd if=/dev/rsd2 of=/dev/null bs=8k & dd if=/dev/rsd3 of=/dev/null bs=8k & dd if=/dev/rsd4 of=/dev/null bs=8k & backup1:/tmp/ssh-dillon# iostat sd1 sd2 sd3 sd4 1 tty sd1 sd2 sd3 sd4 cpu tin tout sps tps msps sps tps msps sps tps msps sps tps msps us ni sy in id 0 8514860 929 0.0 14876 930 0.0 14876 930 0.0 14860 929 0.0 2 0 15 12 72 0 8514908 932 0.0 14923 933 0.0 14892 931 0.0 14923 933 0.0 1 0 16 9 74 Here we are doing 7.5 MBytes/sec or so per drive (30 MBytes/sec for the whole SCSI bus). The transaction rate doing 8K reads is 930 transactions/sec per drive (3720 for the SCSI bus as a whole). ** If I use a larger block size... 16K, 32K, 64K, and so forth, the transfer rate approachs 34 MBytes/sec (9.5 MBytes/sec per drive). None of the drives are able to get all the way up to 11 Mbytes/sec/drive when all four are transfering to the SCSI bus at the same time, so obviously we are hitting a selection limitation of some sort in either the drive or the controller firmware. I've hit higher aggregate bandwidths in the past (36 MBytes/sec or higher), I'm not sure why I couldn't this time. Could be the drives. In anycase, looking at the random-seeking case, where SCSI bus bandwidth is not an issue (drives cannot get anywhere near their platter bandwidth when seeking randomly :-)), it seems pretty clear to me that a SCSI bus should be able to handle upwards of 3700 SCSI transactions/sec without any significant resource degredation other then cpu (25% utilization w/ a PPro 200). So for the sake of argument, lets say, oh, 2500 SCSI transactions/sec is the most we are willing to perform. A randomly-seeking drive as occurs in a web server or news server can handle 150 transactions/sec at best. 2500 / 150 = 16. So one should be able to put 15 SCSI drives on a SCSI bus in a web server / news reader / other situation without saturating any significant resources yet still be able to run the drives at full speed (for the randomly seeking case). For reference, my NNTP box runs between 1 and 2 MBytes/sec worth of bandwidth under normal load. 15 drives @ 2 MBytes/sec is 30 Mbytes/sec, which I've shown to be easily achievable in the tests above. So the SCSI concurrency here is very significant... you really can throw 15 drives onto a SCSI bus, though realistically you may not be able to extend an ultra-wide cable to that many. I think that IDE is fine as long as you only put in one drive per IDE controller, at least in a randomly seeking situation. In a linear reading situation the drive caching has (through previous postings made by others) been shown to scale to two drives on an IDE bus. Even saying that, I would never personally use IDE in a production commercial system unless the disk were irrelevant, like in a recursive DNS server or a radius server. But I would like to interject one last point here.... on all of our shell machines, web servers, news machines, the resource we run out of *first* are disk suds. It makes little sense to me, if you have two drives, to intentionally halve performance by putting them on one IDE bus, or if you have four drives to halve performance by putting two on each of two IDE busses. If one only has two drives, then putting one on each IDE bus should yield acceptable results, but beyond that a person is seriously limiting the platform's performance. I would go as far as to say 'gee, if we are going to put two drives on the same IDE bus we might as well throw away the 400 MHz P-II motherboard and get something cheaper'. I try to balance resources such that they all start to run out at the same time, otherwise I'm wasting money on something :-). The best example I have of a balanced machine is shell5.ba.best.com. 4 SCSI drives (3 used for web serving / home directories), 512MB ram, P-II/300. It's history has gone something like this: (root drive +) 1x9G drives 128MB ram PPro-200 2x9G drives 256MB ram PPro-200 3x9G drives 384MB ram P-II/300 (note 1) [ 4x9G drives 512MB ram P-II/300 ] FUTURE note (1): new motherboard to accomodate more memory, cpu performance is about the same. -Matt Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. <dillon@backplane.com> (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811130529.VAA01053>