Date: Thu, 12 Dec 1996 16:21:56 +0100 From: se@freebsd.org (Stefan Esser) To: kuku@gilberto.physik.rwth-aachen.de (Christoph Kukulies) Cc: freebsd-hackers@freefall.freebsd.org Subject: Re: ccd - some measurements Message-ID: <Mutt.19961212162156.se@x14.mi.uni-koeln.de> In-Reply-To: <199612120805.JAA25514@gilberto.physik.rwth-aachen.de>; from Christoph Kukulies on Dec 12, 1996 09:05:27 %2B0100 References: <199612120805.JAA25514@gilberto.physik.rwth-aachen.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 12, kuku@gilberto.physik.rwth-aachen.de (Christoph Kukulies) wrote: > > Yesterday I finally got my ccd drive working. I have slow but cheap > SCSI disks and I thought I'd invest in future and put in two > ncr/pci controllers, with one drive attached to each (for the first). Did you try with both drives connected to one controller, too ? > Then I ran a batch job [ ... ] > which tested the ccd performance with different interleave factors using > bonnie. The results are below. > > I didn't yet get higher than an interleave of 96. I didn't expect that > the results did still grow beyonf and interleave of 32 anyway. So > I tend to think that benchmarking the ccd with bonnie that way might > be questionable. Why ??? > Also strange the high CPU percentage which I would expect with a PCI > busmaster DMA driven controller/transfer. Yes, it seems that the CCD configuration needs a lot more CPU cycles than the single drive. But in order to test for the cycles actually spent in the driver, you have to directly access the raw disk device. And you will most probably find a load in the low percent range, even for very small block sizes. On my ASUS SP3G with an 133MHz AMD 486 CPU (AMD 5x86): # time dd if=/dev/rsd0 of=/dev/null bs=1k count=10000 10240000 bytes transferred in 6.515382 secs (1571665 bytes/sec) 6.59 real 0.09 user 1.79 sys # time dd if=/dev/rsd0 of=/dev/null bs=10k count=1000 10240000 bytes transferred in 2.618192 secs (3911096 bytes/sec) 2.66 real 0.02 user 0.19 sys # time dd if=/dev/rsd0 of=/dev/null bs=32k count=1000 32768000 bytes transferred in 4.923350 secs (6655631 bytes/sec) 4.94 real 0.03 user 0.22 sys This (and further measurements I made) indicate a driver overhead of 177us (incl. kernel call an estimated 25us system call overhead) plus 6.5us per 4KB page transfered. This does not account for the reduced CPU performance (due to reduced memory bandwidth and higher latency, while a DMA transfer is active). For comaprison: The numbers for /dev/zero are 24us + 96us/Page. All these overheads are computed per transfer, of course, and do account for reading the input file and writing to /dev/null, since that is what "dd" does ... (BTW: The user time per transfer comes out a factor of 4 lower when reading from /dev/zero, than from /dev/rsd0. It appears to be some 3.5us per read from /dev/zero, but 12us per read from /dev/rsd0. The user time values are much less reproducable in the latter case, but vary between 7us and 20us. But they are significantly higher than the /dev/zero numbers, anyway ...) > Quantum Atlas 2GB, PPRO-200/256, 32MB > > Bonnie result: > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > PPRO 100 5006 43.3 4959 11.3 2290 6.3 5022 43.7 4890 7.0 87.1 1.9 These numbers are significantly lower than the typical throughput of the Atlas on the outer tracks, so I assume you either used a partition that started far inside, or the file system was mostly full ... > CCD Amd K5/PR-133, two ncr/pci controllers, 2 Quantum Tempest 3.2 GB (slow) > > I/L > 8 100 2103 39.5 2118 10.9 828 7.0 2555 39.6 2324 13.6 58.9 4.3 > 32 100 3271 53.3 3322 14.8 1469 12.2 5241 78.7 5067 27.5 68.7 4.0 > 64 100 3657 57.8 3669 15.9 2153 17.7 6417 95.2 6430 34.5 69.2 4.1 > 96 100 4312 69.6 4404 19.7 2414 19.5 6359 93.6 6486 34.4 67.8 4.1 Hmmm, what is the unit of I/L in this case ? Since the drives spindles are not synchronized and the QM TM has a very small cache, the results might vary quite a lot, depending on the relative angular position of the platters. The AMD 5K86 CPU load per MB/s is about twice that of the PPro/Atlas, and that does not look that wrong to me :) The compute intensive tasks are the buffer management and file system overhead, which is independent of the controller and disk drives used. But since these operations are being performed simultanously to the file data transfers (at least for the read ahead, and also if tags are used), it pays back to have the CPU do them, while the controller does the data transfer on its own (ie. as a bus-master). Although the average load of a PIO controller over the day is not high, it happens to consume cycles exactly when the system is running under a peak load anyway. That's why I still prefer bus-master DMA, no matter how fast the CPU gets ... (And in fact, the relative impact of PIO I/O on the CPU is higher if the CPU is faster. It has to wait for the slow peripheral, and just can't switch back to executing user code, even if the drive can't deliver data fast enough to keep the CPU busy. But it is hard to measure that effect, since the PIO cycles will be accounted to some unrelated program ...) Regards, STefan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Mutt.19961212162156.se>