Date: Mon, 03 Oct 2005 12:08:31 -0300 From: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?= <tuliogs@pgt.mpt.gov.br> To: freebsd-performance@freebsd.org Subject: Re: dd(1) performance when copying a disk to another Message-ID: <4341496F.9020703@pgt.mpt.gov.br> In-Reply-To: <20051003222844.R44500@delplex.bde.org> References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <43401C62.2040606@centtech.com> <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> <20051003222844.R44500@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------080001030101000801070700 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Phew, thanks for that. :) This seems to answer my question in the other "leg" of the thread, though it hadnīt yet arrived to me when I wrote the message, though. Now THATīs a quite good explanation. ;) Thanks again, Tulio G. da Silva Bruce Evans wrote: > On Mon, 3 Oct 2005, Patrick Proniewski wrote: > >>>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 >>>>> 1000+0 records in >>>>> 1000+0 records out >>>>> 1048576000 bytes transferred in 17.647464 secs (59417943 >>>>> bytes/sec) >>>> > > Many wrong answers to the original question have been given. dd with > a blocks size of 1m between (separate) disk devices is much slower > just because that block size is far too large... > > The above is a fairly normal speed. The expected speed depends mainly > on the disk technology generation and the placement of the sectors being > read. I get the following speeds for _sequential_ _reading- from the > outer (fastest) tracks of 6- and 3-year old drives which are about 2 > generations apart: > > %%% > Sep 25 21:52:35 besplex kernel: ad0: 29314MB <IBM-DTLA-307030> > [59560/16/63] at ata0-master UDMA100 > Sep 25 21:52:35 besplex kernel: ad2: 58644MB <IC35L060AVV207-0> > [119150/16/63] at ata1-master UDMA100 > ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 > bytes/sec) > ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 > bytes/sec) > ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 > bytes/sec) > ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 > bytes/sec) > ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 > bytes/sec) > ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 > bytes/sec) > ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 > bytes/sec) > ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 > bytes/sec) > ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 > bytes/sec) > ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 > bytes/sec) > ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 > bytes/sec) > ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 > bytes/sec) > ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 > bytes/sec) > ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 > bytes/sec) > ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 > bytes/sec) > ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 > bytes/sec) > ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 > bytes/sec) > ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 > bytes/sec) > ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 > bytes/sec) > ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 > bytes/sec) > ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 > bytes/sec) > ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 > bytes/sec) > ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 > bytes/sec) > ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 > bytes/sec) > %%% > > Drive technology hit a speed plateau a few years ago so newer single > drives > aren't much faster unless they are more expensive and/or smaller. > > The speed is low for small block sizes because the device has to be > talked too too much and the protocol and firmware are not very good. > (Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps > up to about half speed (26MB/sec) for a block size of 4K but sticks > at that speed for block sizes 8K and 16K, then jumps up to full speed > for a block sizes of 32K and larger. This indicates some firmware > stupidness). Most drives ramp up almost logarithmically (doubling > the block size almost doubles the speed). This behaviour is especially > evident on slow SCSI drives like some (most?) ZIP and dvd/cd. The > command overhead can be 20 msec, so you had better not do 1 512 bytes > of i/o per command or you will get a speed of 25K/sec. The command > overhead of a new ATA drive is more like 50 usec, but that is still > far too much for high speed with a block size of 512 bytes. > > The speed is insignificantly different for block sizes larger than a > limit because the drive's physical limits dominate except possibly > with old (slow) CPUs. > >>>> That seems to be 2 or about 2 times faster than disc->disc >>>> transfer... But still slower, than I would have expected... >>>> SATA150 sounds like the drive can do 150MB/sec... >>> >> >> As Eric pointed out, you just can"t reach 150 MB/s with one disk, >> it's a technological maximum for the bus, but real world performance >> is well bellow this max. >> In fact, I've though I would reach about 50 to 60 MB/s. > > > 50-60 MB/s is about right. I haven't benchmarked any SATA or very new > drives. Apparently they are not much faster. ISTR that WDC Raptors are > speced for 70-80MB/sec. You pay twice as much to get a tiny drive with > only 25% more throughput plus faster seeks. > >>>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6 >>>>>> without destroying the previous work... :-)) >>>>> >>>>> >>>>> well, not very easy both disk are the same size ;) >>>> >> >>>> I thought of the first 1000 1MB blocks... :-) >>> >> >> damn, I misread this one... :) >> I'm gonna try this asap. > > > I divide disks into equally sized (fairly small, or half the disk size) > partitions, and cp between them. dd is too hard to use for me ;-). cp > is easier to type and automatically picks a reasonable block size. Of > course I use dd if the block size needs to be controlled, but mostly I > only use it in preference to cp to get its timing info. > >> ... >> >>> Have you tried a smaller block size? What does 8k, 16k, or 512k do >>> for you? There really isn't much room for improvement here on a >>> single device. >> >> >> nop, I'll try one of them, but I can't do many experiments, the box >> is in my living room, it's a 1U rack, and it's VERY VERY noisy. My >> girlfriend will kill me if it's running more than an hour a day :)) > > > Smaller block sizes will go much faster, except for copying from a > disk to > itself. Large block sizes are normally a pessimization and the > pessimization > is especially noticeable for dd. Just use the smallest block size > that gives > an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly > different for writing). Large block sizes are pessimal for synchronous > i/o like dd does. The timing for dd'ing blocks of size N MB at R MB/sec > between ad0 and ad2 is something like: > > time in secs activity on ad0 activity on ad2 > ------------ --------------- --------------- > 0 start read of 1MB idle > N/R finish read; idle start write of 1MB > N/R-epsilon start read of 1MB pretend to complete write > N/R continue read complete write > N/R-epsilon finish read; idle start write of 1MB > N/R-2*epsilon ... ... > > After the first block (which takes a little longer), it takes N/R-epsilon > seconds to copy 1 block, where epsilon is the time between the writer's > pretending to complete the write and actually completing it. This time > is obviously not very dependent on the block size since it is limited by > drives resources and policies (in particular, if the drive doesn't do > write > caching, perhaps because write caching is not enabled, then epsilon is 0, > and if out block size is large compared with the drive's cache then the > drive won't be able to signal completion until no more than the drive's > cache size is left to do). Thus epsilon becomes small relative to the > N/R term when N is large. Apparently, in your case the speed drops from > 59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200. > > With large block sizes, the speed can be increased using asyncronous > output. > There is a utility (in ports) named team that fakes async output using > separate processes. I have never used it. Somthing as simple as 2 > dd's in a pipe should work OK. > > For copying from a disk itself, a large block sizes is needed to limit > the > number of seeks, and concurrent reads and writes are exactly what is not > needed (since they would give competing seeks). The i/o must be > sequentialized, and dd does the right things for this, though the drive > might not (you would prefer epsilon == 0, since if the drive signals > write completion early then it might get confused when you flood it > with the next read and seek to start the read before it completes the > write, then thrash back and forth between writing and reading). > > It is interesting that writing large sequential files to at least the > ffs file system (not mounted with -sync) in FreeBSD is slightly faster > than writing directly to the raw disk using write(2), even if the > device driver sees almost the same block sizes for these different > operations. This is because write(2) is synchronous and sync writes > always cause idle periods (the idle periods are just much smaller for > writing data that is already in memory), while the kernel uses async > writes for data. > > Bruce > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > --------------080001030101000801070700--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4341496F.9020703>