Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Dec 1996 16:21:56 +0100
From:      se@freebsd.org (Stefan Esser)
To:        kuku@gilberto.physik.rwth-aachen.de (Christoph Kukulies)
Cc:        freebsd-hackers@freefall.freebsd.org
Subject:   Re: ccd - some measurements
Message-ID:  <Mutt.19961212162156.se@x14.mi.uni-koeln.de>
In-Reply-To: <199612120805.JAA25514@gilberto.physik.rwth-aachen.de>; from Christoph Kukulies on Dec 12, 1996 09:05:27 %2B0100
References:  <199612120805.JAA25514@gilberto.physik.rwth-aachen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 12, kuku@gilberto.physik.rwth-aachen.de (Christoph Kukulies) wrote:
> 
> Yesterday I finally got my ccd drive working. I have slow but cheap
> SCSI disks and I thought I'd invest in future and put in two 
> ncr/pci controllers, with one drive attached to each (for the first).

Did you try with both drives connected to one controller, too ?

> Then I ran a batch job
[ ... ]
> which tested the ccd performance with different interleave factors using
> bonnie. The results are below. 
> 
> I didn't yet get higher than an interleave of 96. I didn't expect that
> the results did still grow beyonf and interleave of 32 anyway. So
> I tend to think that benchmarking the ccd with bonnie that way might
> be questionable.

Why ???

> Also strange the high CPU percentage which I would expect with a PCI
> busmaster DMA driven controller/transfer.

Yes, it seems that the CCD configuration needs a lot more CPU cycles
than the single drive. But in order to test for the cycles actually
spent in the driver, you have to directly access the raw disk device.
And you will most probably find a load in the low percent range, even
for very small block sizes.

On my ASUS SP3G with an 133MHz AMD 486 CPU (AMD 5x86):

# time dd if=/dev/rsd0 of=/dev/null bs=1k count=10000
10240000 bytes transferred in 6.515382 secs (1571665 bytes/sec)
        6.59 real         0.09 user         1.79 sys

# time dd if=/dev/rsd0 of=/dev/null bs=10k count=1000
10240000 bytes transferred in 2.618192 secs (3911096 bytes/sec)
        2.66 real         0.02 user         0.19 sys

# time dd if=/dev/rsd0 of=/dev/null bs=32k count=1000
32768000 bytes transferred in 4.923350 secs (6655631 bytes/sec)
        4.94 real         0.03 user         0.22 sys

This (and further measurements I made) indicate a driver overhead
of 177us (incl. kernel call an estimated 25us system call overhead)
plus 6.5us per 4KB page transfered. This does not account for the 
reduced CPU performance (due to reduced memory bandwidth and higher
latency, while a DMA transfer is active).

For comaprison: The numbers for /dev/zero are 24us + 96us/Page.

All these overheads are computed per transfer, of course, and do
account for reading the input file and writing to /dev/null, since
that is what "dd" does ...

(BTW: The user time per transfer comes out a factor of 4 lower when
reading from /dev/zero, than from /dev/rsd0. It appears to be some
3.5us per read from /dev/zero, but 12us per read from /dev/rsd0. 
The user time values are much less reproducable in the latter case,
but vary between 7us and 20us. But they are significantly higher 
than the /dev/zero numbers, anyway ...)

> Quantum Atlas 2GB, PPRO-200/256, 32MB
> 
> Bonnie result:
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
> PPRO      100  5006 43.3  4959 11.3  2290  6.3  5022 43.7  4890  7.0  87.1  1.9

These numbers are significantly lower than the typical throughput of
the Atlas on the outer tracks, so I assume you either used a partition
that started far inside, or the file system was mostly full ...

> CCD Amd K5/PR-133, two ncr/pci controllers, 2 Quantum Tempest 3.2 GB (slow)
> 
> I/L
>  8        100  2103 39.5  2118 10.9   828  7.0  2555 39.6  2324 13.6  58.9  4.3
> 32        100  3271 53.3  3322 14.8  1469 12.2  5241 78.7  5067 27.5  68.7  4.0
> 64        100  3657 57.8  3669 15.9  2153 17.7  6417 95.2  6430 34.5  69.2  4.1
> 96        100  4312 69.6  4404 19.7  2414 19.5  6359 93.6  6486 34.4  67.8  4.1

Hmmm, what is the unit of I/L in this case ?

Since the drives spindles are not synchronized and the QM TM has a 
very small cache, the results might vary quite a lot, depending on 
the relative angular position of the platters.

The AMD 5K86 CPU load per MB/s is about twice that of the PPro/Atlas, 
and that does not look that wrong to me :)

The compute intensive tasks are the buffer management and file system 
overhead, which is independent of the controller and disk drives used.
But since these operations are being performed simultanously to the 
file data transfers (at least for the read ahead, and also if tags are
used), it pays back to have the CPU do them, while the controller does
the data transfer on its own (ie. as a bus-master).

Although the average load of a PIO controller over the day is not high,
it happens to consume cycles exactly when the system is running under
a peak load anyway. That's why I still prefer bus-master DMA, no matter
how fast the CPU gets ... (And in fact, the relative impact of PIO I/O
on the CPU is higher if the CPU is faster. It has to wait for the slow
peripheral, and just can't switch back to executing user code, even if
the drive can't deliver data fast enough to keep the CPU busy. But it 
is hard to measure that effect, since the PIO cycles will be accounted 
to some unrelated program ...)


Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Mutt.19961212162156.se>